+ Reply to Thread
Results 1 to 2 of 2

Web Crawler to dowload PDFs

  1. #1
    Forum Guru
    Join Date
    03-12-2010
    Location
    Canada
    MS-Off Ver
    2010 and 2013
    Posts
    4,418

    Web Crawler to dowload PDFs

    Hi,

    Anyone know of any vba macros that can crawl from a starting URL to x number of levels of URLs and save all available PDFs to a certain folder.

    I used to do this a long time ago with professional software to recreate a website or just download any specific file types I wanted.

    Does anyone know of something similar in Excel vba?

    The steps would be:

    1) User inputs a starting URL (e.g. https://www.google.com/search?btnG=1...&q=excel%2Bpdf)

    2) The web crawler would look for every URL on that page and enter it on a sheet and download all files of a certain type (e.g. PDFs) from that page

    3) For each retrieved URL, this would be a new starting URL and repeat step 2 again

    4) The process would continue for x number of cycles or until there are no more URLs to scrape.

    So for example, if I said download only PDFs from abousetta.com (which isn't my site just to be clear), eventually the program would run out of webpages and stop running.

    Any thoughts?

    abousetta

    P.S. Even if you know of some parts of this problem would be helpful. Thanks.
    Please consider:

    Thanking those who helped you. Click the star icon in the lower left part of the contributor's post and add Reputation.
    Cleaning up when you're done. Mark your thread [SOLVED] if you received your answer.

  2. #2
    Forum Contributor
    Join Date
    10-13-2012
    Location
    Southern California
    MS-Off Ver
    Excel 2007
    Posts
    401

    Re: Web Crawler to dowload PDFs

    Quote Originally Posted by abousetta View Post

    P.S. Even if you know of some parts of this problem would be helpful. Thanks.
    A week or so ago I wrote a little vba code that went to 300 or so different college football website (all from the same domain) and then downloaded college football scores. It was only 30 or so lines of code, as I recall.

    http://www.excelforum.com/excel-prog...82#post3254682

    It might be a very basic starting point for your project.

    Your project sounds like a good candidate for recursive subroutine (which calls itself) - and they've always been tricky for me to write. If I can get around doing so, and code it differently, I always do so.


    Edit: Ah, with 3,190 posts, I'm sure I don't have to tell you about vba code that visits websites, or recursion!

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1