+ Reply to Thread
Results 1 to 6 of 6

Thread: Pulling source data from a list of URLs

  1. #1
    Registered User
    Join Date
    08-31-2010
    Location
    Connecticut
    MS-Off Ver
    Excel 2007
    Posts
    11

    Pulling source data from a list of URLs

    Hi All,
    I'm new to VBA and not sure if this is even an appropriate task for such.

    Per Excel 2007, I have a list of 25 urls in cells A1:A25. I'm looking for three pieces of information from the source code of each url. Place ID, Accept Policy, and Total.

    Each of the three pieces of data is in the same part of the source code in all 25 of the urls.

    Place ID is always located here after place_id and value in the source code:
    <input type="hidden" name="place_id" value="10450" />

    Accept Policy is always here in between the <p>:
    />Accept Policy</span></h3>
    <p>Always Accepted.
    </p>


    Summary is always here in between the <p>:
    <p class="summary_content_medium" style="color: #422; font-size: 90%; font-style: italic;">
    Total: 264,
    Summarized
    </p>


    Is it possible to extract these three data points from the source code of all 25 urls and place it in either a new sheet or the same sheet??

    Thanks in advance for your help and suggestions.
    Last edited by Bootkie2; 02-02-2012 at 09:44 PM.

  2. #2
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & read 2007
    Posts
    15,979

    Re: Pulling source data from a list of URLs

    Hello Bootkie2,

    It is very possible. You should post the page source as a text file in its entirety. This will ensure the data is extracted correctly.
    Sincerely,
    Leith Ross

    Remember To Do the Following....

    1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.
    2. Thank those who have helped you by clicking the Star below the post.
    3. Please mark your post [SOLVED] if it has been answered satisfactorily.


    Old Scottish Proverb...
    Luathaid gu deanamh maille! (Rushing causes delays!)

  3. #3
    Registered User
    Join Date
    08-31-2010
    Location
    Connecticut
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Pulling source data from a list of URLs

    Hi Leith,
    Thanks for your help. Attached is a sample of the source code from one of the pages.

    Best
    -Bootkie
    Attached Files Attached Files

  4. #4
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & read 2007
    Posts
    15,979

    Re: Pulling source data from a list of URLs

    Hello Bootkie2,

    I downloaded the file. This is will be a big help. Are all the URLs in this same format?
    Sincerely,
    Leith Ross

    Remember To Do the Following....

    1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.
    2. Thank those who have helped you by clicking the Star below the post.
    3. Please mark your post [SOLVED] if it has been answered satisfactorily.


    Old Scottish Proverb...
    Luathaid gu deanamh maille! (Rushing causes delays!)

  5. #5
    Registered User
    Join Date
    08-31-2010
    Location
    Connecticut
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Pulling source data from a list of URLs

    Hi Leith,
    Yes, all url source code is in the same format. The 3 desired data points are in the same place in each urls source. Thanks.

    -Bootkie

  6. #6
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & read 2007
    Posts
    15,979

    Re: Pulling source data from a list of URLs

    Hello Bootkie2,

    The file you provided contained only the first of the three parameters you are seeking. The macro was written and tested on the file source file you provided. The code worked for the first parameter but the other two are untested. The attached workbook contains the macro below.

    The macro checks the URL list on "Sheet1" starting with cell "A2" down to the last entry in column "A". Columns "B:D" hold the Place Id, Policy, and Summary. If an item is not found then the cell will be empty.
    
    ' Thread: http://www.excelforum.com/excel-programming/811150-pulling-source-data-from-a-list-of-urls.html
    ' Poster: Bootkie2
    ' Written: January 24, 2012
    ' Author:  Leith Ross
    
    Option Explicit
    
    Sub ScrapeData()
    
        Dim Cell As Range
        Dim I As Long
        Dim N As Long
        Dim objReq As Object
        Dim PageSource As String
        Dim Rng As Range
        Dim RngEnd As Range
        Dim strFindStart As String
        Dim strFindStop As String
        Dim strFound As String
        Dim Text As String
        Dim URL As String
        Dim URLStatus As String
        Dim Wks As Worksheet
        
            Set Wks = Worksheets("Sheet1")
            
            Set Rng = Wks.Range("A2")
            Set RngEnd = Wks.Cells(Rows.Count, Rng.Column).End(xlUp)
            If RngEnd.Row < Rng.Row Then Exit Sub Else Set Rng = Wks.Range(Rng, RngEnd)
            
                For Each Cell In Rng
                
                    URL = Cell.Value
                    
                    On Error Resume Next
                        Set objReq = CreateObject("WinHttp.WinHttpRequest.5.1")
                        If objReq Is Nothing Then
                           Set objReq = CreateObject("WinHttp.WinHttpRequest.5")
                        End If
                    Err.Clear
                    On Error GoTo 0
            
                    ' Add protocol if missing
                      If InStr(1, URL, "://") = 0 Then
                         URL = "http://" & URL
                      End If
    
                    ' Launch the httpRequest synchronously
                      On Error Resume Next
                         objReq.Open "GET", URL, False
                         If Err.Number <> 0 Then
                          ' Handle connection errors
                            URLStatus = Err.Description
                            Err.Clear
                            Exit Sub
                         End If
                     On Error GoTo 0
               
                  ' Send the http httpRequest for server status
                    On Error Resume Next
                       objReq.Send
                       objReq.WaitForResponse
                       If Err.Number <> 0 Then
                        ' Handle server errors
                          PageSource = "Error"
                          URLStatus = Err.Description
                          Err.Clear
                       Else
                        ' Show HTTP response info
                          URLStatus = objReq.Status & " - " & objReq.StatusText
                        ' Save the web page text
                          PageSource = objReq.ResponseText
                       End If
                    On Error GoTo 0
                    
                      ' Remove Carraige Return and Line Feed characters
                        Text = Replace(PageSource, vbCr, "")
                        Text = Replace(Text, vbLf, "")
                        
                        strFindStart = "name=""place_id"" value="""
                        strFindStop = Chr(34)
                        GoSub ExtractString
                        
                        Cell.Offset(0, 1) = strFound
                        
                        strFindStart = "Accept Policy</span></h3><p>"
                        strFindStop = "</p>"
                        GoSub ExtractString
                        
                        Cell.Offset(0, 2) = strFound
                        
                        strFindStart = "<p class=""summary_content_medium"""
                        strFindStop = ">"
                        GoSub ExtractString
                        
                        If strFound <> "" Then
                         ' Start search back 1 character from last strStop
                           I = I - 1
                           strFindStart = ">"
                           strFindStop = "</p>"
                           GoSub ExtractString
                        
                          Cell.Offset(0, 3) = strFound
                       End If
                       
                Next Cell
                
                
    ExitSub:
        Exit Sub
        
        
    ExtractString:
            strFound = ""
            
            I = InStr(I + 1, Text, strFindStart, vbTextCompare)
            If I = 0 Then Return Else I = I + Len(strFindStart)
                        
            N = InStr(I, Text, strFindStop, vbTextCompare)
            If N = 0 Then Return Else N = N + Len(strFindStop)
                            
            strFound = Mid(Text, I, N - I - Len(strFindStop))
            
            I = N
        Return
    
    End Sub
    Attached Files Attached Files
    Sincerely,
    Leith Ross

    Remember To Do the Following....

    1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.
    2. Thank those who have helped you by clicking the Star below the post.
    3. Please mark your post [SOLVED] if it has been answered satisfactorily.


    Old Scottish Proverb...
    Luathaid gu deanamh maille! (Rushing causes delays!)

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.2.0