+ Reply to Thread
Results 1 to 15 of 15

Regex to Parse Data from Between HTML Source Tags

  1. #1
    Registered User
    Join Date
    07-04-2012
    Location
    Colorado
    MS-Off Ver
    Excel 2010
    Posts
    5

    Regex to Parse Data from Between HTML Source Tags

    Good morning all!

    I am stuck on writing on some VBA code. Thus far my experience with writing VBA code in Excel has been simple copying, pasting, sorting, etc - stuff that I can record a macro for and just copy the code over.

    My current need way exceeds my experience and expertise and I need some help.



    I have a VBA string variable which contains multiple blocks of HTML code from a website source. From this VBA string, I need to parse specific data that is located between tags in the HTML source code. Here is an example of the HTML source code from which I need to parse data:


    <p class="row">
    <span class="ih" id="images:ABC123.jpg">&nbsp;</span>
    <span class="itemdate"> Jul 2</span>
    <span class="itemsep"> - </span>
    <a href="http://www.abc.com">ABC Heating and Cooling</a>
    <span class="itemsep"> - </span>
    <span class="itemph"></span>
    <span class="itempp"> $800</span>
    <span class="itempn"><font size="-1"> (ABC)</font></span>
    <span class="itemcg" title="gms"> <small class="gc"><a href="/gms/">ABC Get Your Free Quote</a></small></span>
    <span class="itempx"> <span class="p"> pic</span></span>
    <br class="c">
    </p>

    If you consider the above HTML source code as a "block" of code, then the VBA string variable contains multiple "blocks" of this code. For the example code above, I need to parse out the following data only (keeping in mind that this process will have to be repeated until all of the "blocks" have been processed through):


    images:ABC123.jpg
    Jul 2
    -
    http://www.abc.com
    ABC Heating and Cooling
    -
    $800
    (ABC)
    ABC Get Your Free Quote


    Everything that is highlighted in this picture is what I need parsed out:

    Parsed.jpg


    Then I would like to pass the parsed data out to a variant array which could later be used to paste into an Excel worksheet. I am thinking the array would be in the following format with the top row of the array being headers (using the same example above):

    SampleArray.jpg


    After finishing one of these "blocks", the program has to go to the next "block" and repeat the parsing process (and making a new row in the array for that block of data) until it has gone through all of the "blocks" in the VBA string. I am guessing that some kind of loop would be needed for this.

    My best guess for the parsing is to use Regex???


    Thank you all for taking a look at this, I appreciate the help!

  2. #2
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Regex to Parse Data from Between HTML Source Tags

    Can you post the url?

  3. #3
    Registered User
    Join Date
    07-04-2012
    Location
    Colorado
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Regex to Parse Data from Between HTML Source Tags

    Thanks for the response - I already have a VBA string with all of the data I want from the website, so all I need to do is parse that string as described above.

    Hope that helps, thanks for looking and for taking the time to respond!

  4. #4
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Regex to Parse Data from Between HTML Source Tags

    Not really since this is only a chunk of the webpage, id need to see the whole source otherwise you would probably end up with some odd results

  5. #5
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Regex to Parse Data from Between HTML Source Tags

    Perhaps instead of a regex, see http://www.vbaexpress.com/forum/showthread.php?t=31831 for an example of using the Microsoft HTML Object Library
    Entia non sunt multiplicanda sine necessitate

  6. #6
    Registered User
    Join Date
    07-04-2012
    Location
    Colorado
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Regex to Parse Data from Between HTML Source Tags

    Totally understand, in my mind it was a slam dunk. Here is a sample URL: http://fortcollins.craigslist.org/se...cel&catAbb=sss

    I am interested in the Craigslist listing data - I already have code that copies the webpage source, loads it into a string, then cuts out everything except each unique listing which is represented by those lines of HTML tagged code above, the the next unique listing just repeats the same pattern and on and on...

    Does that clarify it better for you? The pattern (represented by the HTML tags) for each listing is going to be the same for each unique listing.

  7. #7
    Registered User
    Join Date
    07-04-2012
    Location
    Colorado
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Regex to Parse Data from Between HTML Source Tags

    shg - thanks for the post!

    Pardon my lack of experience - what would be the advantage of using this method vs a regex? Also, how would the code look for my specific circumstance? I do not think I need to open an instance of IE as I already have the webpage source stored in a VBA string.

    Thank you for the suggestion, good link!

  8. #8
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Regex to Parse Data from Between HTML Source Tags

    I have never used it, so I dunno -- but it seems made to order for the topic. I didn't look closely at what you're trying to do, but creating your own regex expressions when there is a ready-made HTML parser available (I think) would be self-flagellation.

  9. #9
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Regex to Parse Data from Between HTML Source Tags

    This is how I'd do it, trying to parse HTML with regex is like trying to eat soup with a fork, it just can't cope with the complexities of html (and xml), so I'd use the ie object library. Note the below requires IE 9, as I don't believe .getElementbyClassName is available in earlier versions; also you'll see there is no need to open ie, we just use it's object library. You say that you already have the source in a string, but I don't so I've included getting the string in the below. You should be able to adapt the below, essentially we create a htmldocument directly from a string (in my case the result of the xml call) in yours probably your existing string; we then loop through it's "rows" which is a collection of elements with the same class name, since all your data is nested in these rows, we can then loop through each row's children to extract the data.

    This data is written to an array and then dumped into sheet 2 starting at "A1"

    Note: You will need to set a reference to the Microsoft HTML Object Library


    Please Login or Register  to view this content.
    Let me know if it does what you are looking for
    Last edited by Kyle123; 07-06-2012 at 05:37 AM.

  10. #10
    Registered User
    Join Date
    07-04-2012
    Location
    Colorado
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Regex to Parse Data from Between HTML Source Tags

    Kyle -

    That my friend is perfection, you cannot teach that!

    Good work! It worked amazing, exactly what I was looking for. I was working on the Regex expressions, but this is way more simple.

    Thank you!

  11. #11
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Regex to Parse Data from Between HTML Source Tags

    Yep, I'm good at simple

    Glad it worked, thanks for the feedback

  12. #12
    Registered User
    Join Date
    01-22-2016
    Location
    las vegas, nv
    MS-Off Ver
    2013
    Posts
    2

    Re: Regex to Parse Data from Between HTML Source Tags

    Kyle -

    I am getting an error with this line:

    ReDim data(1 To rowCount, 1 To 10)

    Any ideas of what could cause this?

  13. #13
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,917

    Re: Regex to Parse Data from Between HTML Source Tags

    dharrison85 welcome to the forum

    Unfortunately your post does not comply with Rule 2 of our Forum RULES. Do not post a question in the thread of another member -- start your own thread.

    If you feel an existing thread is particularly relevant to your need, provide a link to the other thread in your new thread.

    Old threads are often only monitored by the original participants. New threads not only open you up to all possible participants again, they typically get faster response, too.
    1. Use code tags for VBA. [code] Your Code [/code] (or use the # button)
    2. If your question is resolved, mark it SOLVED using the thread tools
    3. Click on the star if you think someone helped you

    Regards
    Ford

  14. #14
    Registered User
    Join Date
    01-22-2016
    Location
    las vegas, nv
    MS-Off Ver
    2013
    Posts
    2

    Re: Regex to Parse Data from Between HTML Source Tags

    Kyle -

    I am getting an error with this line:

    ReDim data(1 To rowCount, 1 To 10)

    Any ideas of what could cause this?

  15. #15
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,917

    Re: Regex to Parse Data from Between HTML Source Tags

    dharrison85 Your post does not comply with Rule 7 of our Forum RULES. Please do not ignore requests by Administrators, Moderators and senior forum members regarding forum rules.

    If you are unclear about the request or instruction then send a private message to them asking for clarification.

    All participants:
    Please do not post a reply in a thread where a moderator has requested an action that has not yet been complied with e.g Title change or Code tags...etc. Thanks.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1