Hi All,
I'm new to VBA and not sure if this is even an appropriate task for such.
Per Excel 2007, I have a list of 25 urls in cells A1:A25. I'm looking for three pieces of information from the source code of each url. Place ID, Accept Policy, and Total.
Each of the three pieces of data is in the same part of the source code in all 25 of the urls.
Place ID is always located here after place_id and value in the source code:
<input type="hidden" name="place_id" value="10450" />
Accept Policy is always here in between the <p>:
/>Accept Policy</span></h3>
<p>Always Accepted.
</p>
Summary is always here in between the <p>:
<p class="summary_content_medium" style="color: #422; font-size: 90%; font-style: italic;">
Total: 264,
Summarized
</p>
Is it possible to extract these three data points from the source code of all 25 urls and place it in either a new sheet or the same sheet??
Thanks in advance for your help and suggestions.
Last edited by Bootkie2; 02-02-2012 at 09:44 PM.
Hello Bootkie2,
It is very possible. You should post the page source as a text file in its entirety. This will ensure the data is extracted correctly.
Sincerely,
Leith Ross
Remember To Do the Following....
1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.2. Thank those who have helped you by clicking the Starbelow the post.
3. Please mark your post [SOLVED] if it has been answered satisfactorily.
Old Scottish Proverb...
Luathaid gu deanamh maille! (Rushing causes delays!)
Hi Leith,
Thanks for your help. Attached is a sample of the source code from one of the pages.
Best
-Bootkie
Hello Bootkie2,
I downloaded the file. This is will be a big help. Are all the URLs in this same format?
Sincerely,
Leith Ross
Remember To Do the Following....
1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.2. Thank those who have helped you by clicking the Starbelow the post.
3. Please mark your post [SOLVED] if it has been answered satisfactorily.
Old Scottish Proverb...
Luathaid gu deanamh maille! (Rushing causes delays!)
Hi Leith,
Yes, all url source code is in the same format. The 3 desired data points are in the same place in each urls source. Thanks.
-Bootkie
Hello Bootkie2,
The file you provided contained only the first of the three parameters you are seeking. The macro was written and tested on the file source file you provided. The code worked for the first parameter but the other two are untested. The attached workbook contains the macro below.
The macro checks the URL list on "Sheet1" starting with cell "A2" down to the last entry in column "A". Columns "B:D" hold the Place Id, Policy, and Summary. If an item is not found then the cell will be empty.
' Thread: http://www.excelforum.com/excel-programming/811150-pulling-source-data-from-a-list-of-urls.html ' Poster: Bootkie2 ' Written: January 24, 2012 ' Author: Leith Ross Option Explicit Sub ScrapeData() Dim Cell As Range Dim I As Long Dim N As Long Dim objReq As Object Dim PageSource As String Dim Rng As Range Dim RngEnd As Range Dim strFindStart As String Dim strFindStop As String Dim strFound As String Dim Text As String Dim URL As String Dim URLStatus As String Dim Wks As Worksheet Set Wks = Worksheets("Sheet1") Set Rng = Wks.Range("A2") Set RngEnd = Wks.Cells(Rows.Count, Rng.Column).End(xlUp) If RngEnd.Row < Rng.Row Then Exit Sub Else Set Rng = Wks.Range(Rng, RngEnd) For Each Cell In Rng URL = Cell.Value On Error Resume Next Set objReq = CreateObject("WinHttp.WinHttpRequest.5.1") If objReq Is Nothing Then Set objReq = CreateObject("WinHttp.WinHttpRequest.5") End If Err.Clear On Error GoTo 0 ' Add protocol if missing If InStr(1, URL, "://") = 0 Then URL = "http://" & URL End If ' Launch the httpRequest synchronously On Error Resume Next objReq.Open "GET", URL, False If Err.Number <> 0 Then ' Handle connection errors URLStatus = Err.Description Err.Clear Exit Sub End If On Error GoTo 0 ' Send the http httpRequest for server status On Error Resume Next objReq.Send objReq.WaitForResponse If Err.Number <> 0 Then ' Handle server errors PageSource = "Error" URLStatus = Err.Description Err.Clear Else ' Show HTTP response info URLStatus = objReq.Status & " - " & objReq.StatusText ' Save the web page text PageSource = objReq.ResponseText End If On Error GoTo 0 ' Remove Carraige Return and Line Feed characters Text = Replace(PageSource, vbCr, "") Text = Replace(Text, vbLf, "") strFindStart = "name=""place_id"" value=""" strFindStop = Chr(34) GoSub ExtractString Cell.Offset(0, 1) = strFound strFindStart = "Accept Policy</span></h3><p>" strFindStop = "</p>" GoSub ExtractString Cell.Offset(0, 2) = strFound strFindStart = "<p class=""summary_content_medium""" strFindStop = ">" GoSub ExtractString If strFound <> "" Then ' Start search back 1 character from last strStop I = I - 1 strFindStart = ">" strFindStop = "</p>" GoSub ExtractString Cell.Offset(0, 3) = strFound End If Next Cell ExitSub: Exit Sub ExtractString: strFound = "" I = InStr(I + 1, Text, strFindStart, vbTextCompare) If I = 0 Then Return Else I = I + Len(strFindStart) N = InStr(I, Text, strFindStop, vbTextCompare) If N = 0 Then Return Else N = N + Len(strFindStop) strFound = Mid(Text, I, N - I - Len(strFindStop)) I = N Return End Sub
Sincerely,
Leith Ross
Remember To Do the Following....
1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.2. Thank those who have helped you by clicking the Starbelow the post.
3. Please mark your post [SOLVED] if it has been answered satisfactorily.
Old Scottish Proverb...
Luathaid gu deanamh maille! (Rushing causes delays!)
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks