+ Reply to Thread
Results 1 to 13 of 13

Extract text strings

  1. #1
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Extract text strings

    I have a very large spreadsheet with html in each cell in column 1. There are multiple text strings (potentially 3 to 20) within the html in each cell. Each string is preceded with HREF= and the string has a " before and after the string.

    I want to have a macro that searches each cell in column 1 and finds the HREF= (multiple instances...as few as 3 to as many as 20) and extracts the string between the double quotes (i.e. "string") and writes them to the first column in worksheet 2 in the workbook.

    I am not a programmer...but have spent a few days trying to find a macro or use the MID or FIND function to do this with minimal success. Can someone please help?

    Thanks!

  2. #2
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Macro/VBS to extract text strings...please help!

    Post some example strings (better yet a file with strings within).

    The key will be whether or not other text appears within the string(s) which is to be ignored / irrelevant

    In the most general of terms you can use the Split Function to separate the string (populate a Variant array) based on the hyperlink tag.
    However, if there is surplus text to be accounted for then the Split approach will most likely require further adaptation.

    Without examples it's hard to give anything more specific I'm afraid.

  3. #3
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,885

    Re: Macro/VBS to extract text strings...please help!

    Hi Wazing, welcome to the forum.

    Perhaps this code will help you out. It loops through the cells in column A on Sheet1 and creates a list of the values following the href tags into column A on Sheet2.

    Please Login or Register  to view this content.
    May need some tweaking, but I don't have any sample data to work with...

  4. #4
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Extract text strings

    Please Login or Register  to view this content.

  5. #5
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    Paul....Thanks for the code. It did it flawlessly! I really appreciate the help!

  6. #6
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    Hey Paul, quick followup question: One variable I had not included.....in my source worksheet1 column (b) has the sourcepage url for the html in column (A) (could be different source url for each cell). Is there a quick addition to the code that will write this value to the column next to each href link from that cell in worksheet2? This would be a huge help in extracting the data....

    Thanks!

  7. #7
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,885

    Re: Macro/VBS to extract text strings...please help!

    Can you provide a worksheet with some sample data that I can test on? Doesn't sound difficult, but want to make sure I get it right the first time.

  8. #8
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    Here is some "sample" data in the same format to test on......Thanks!
    Attached Files Attached Files

  9. #9
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    You may notice that in each raw html cell there are typically duplicates of each href link....and in cells from the same source URL in column 2, I have to manually remove duplicates. So typically we have to take the output file (which may be thousands of lines long) and sort by column 2, then take that html and remove the href links/remove duplicates for all the href links from the same source url, then copy the source url to the final list from that link and put it in the final worksheet for analysis. It would save a multiple steps if it removed duplicates where column 2 is the same as well....not sure if that is possible but thought I would ask. Maybe it is a separate script I run on the results where it looks for duplicates where they have the same source url in column 2......Thanks!

  10. #10
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Extract text strings

    I can't detect a connection between herefs in column A and URL in column B.

    Based on your example: all the unique hreff in column A
    from cell A20 downward.
    Please Login or Register  to view this content.

  11. #11
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,885

    Re: Extract text strings

    Hi Wazing, I think this amended code does what you're after.

    Please Login or Register  to view this content.

  12. #12
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Extract text strings

    Quote Originally Posted by snb View Post
    I can't detect a connection between herefs in column A and URL in column B.
    Well, the data has been obviously altered to keep it confidential....but a great way to get an example of what I am referring to is to take the sample text and repeat it several times (so you have triple the rows...with the data repeating). I'll give this code a run...thanks for the help! I'll post a response shortly....
    Last edited by Paul; 07-22-2010 at 12:49 PM.

  13. #13
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Extract text strings

    Hey Paul, tried your revised code....worked perfectly the first time.....Thanks again!

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1