Results 1 to 2 of 2

Beginner question about scraping web pages with Excel VBA

Threaded View

  1. #1
    Registered User
    Join Date
    04-13-2012
    Location
    New Jersey
    MS-Off Ver
    Excel 2007
    Posts
    3

    Question Beginner question about scraping web pages with Excel VBA

    Hello everyone,

    I'm trying to do the following:

    1. Visit a website
    2. Fill out an input box
    3. Click Submit
    4. Scrape text from the resulting webpage

    I've pasted my current code at the end of this post. For some reason, that last htmlDoc.body.innerHTML shows the body of the first webpage, not the new webpage after the submit button was clicked. I assumed that by saying Set htmlDoc = objIE.Document after clicking submit, it sets htmlDoc as the new active webpage, so when I grab the body text, it should grab the text from the second page, no?

    I'm really new to all this, so if anyone has a link to a primer on scraping data from web pages with Excel, it would be greatly appreciated as well.

        Dim objIE As InternetExplorer
        Dim htmlDoc As HTMLDocument 'Microsoft HTML Object Library
        Dim htmlInput As HTMLInputElement
        Dim htmlColl As IHTMLElementCollection
        
        Set objIE = New InternetExplorer
         
        With objIE
            .Navigate "https://website.com" ' Main page
            .Visible = 1
            Do While .READYSTATE <> 4: DoEvents: Loop
                Application.Wait (Now + TimeValue("0:00:02"))
                 
                 'set user name and password
                Set htmlDoc = .Document
                Set htmlColl = htmlDoc.getElementsByTagName("INPUT")
                Do While htmlDoc.READYSTATE <> "complete": DoEvents: Loop
                    For Each htmlInput In htmlColl
                        If htmlInput.Name = "email" Then
                            htmlInput.Value = "[email protected]"
                        End If
                    Next htmlInput
                     
                     'click login
                Set htmlDoc = .Document
                Set htmlColl = htmlDoc.getElementsByTagName("input")
                Do While htmlDoc.READYSTATE <> "complete": DoEvents: Loop
                    For Each htmlInput In htmlColl
                        If Trim(htmlInput.Type) = "submit" Then
                            htmlInput.Click
                            Exit For
                        End If
                    Next htmlInput
                End With
    
        Set htmlDoc= objIE.Document
    
        Sheets("Sheet1").Range("A1").Value = htmlDoc.body.innerHTML
    Last edited by sheilnaik; 03-28-2013 at 11:44 AM.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1