+ Reply to Thread
Results 1 to 16 of 16

Get table from webpage

  1. #1
    Forum Contributor
    Join Date
    11-04-2012
    Location
    Australia
    MS-Off Ver
    Excel 2016
    Posts
    334

    Get table from webpage

    I would like to get a particular table from an internal webpage.
    The webpage can have any number of tables.
    The table I am after has no ID or any other unique identifier.
    However, there is always a table with innertext "Course:" the table BEFORE the one I am trying to get.
    How can I write this code to achieve what I need:
    I loop through all the tables in the HTML document.
    If the innertext of the Table = "Course:" then get the next table.

    Please Login or Register  to view this content.
    It would also be helpful if I could get some guidance on how to get columns 1 to 5 for each row of the table, as well as columns 12,14 and 15.


    Any assistance appreciated.
    Thanks.

  2. #2
    Forum Contributor kalbasiatka's Avatar
    Join Date
    02-27-2016
    Location
    Brest, Belarus
    MS-Off Ver
    2021
    Posts
    224

    Re: Get table from webpage

    Please Login or Register  to view this content.
    To do it for me and help me it is 2 different things!
    Sorry for my english, blame Google translator

  3. #3
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,168

    Re: Get table from webpage

    Hi anrichards,

    There is a new Add-In for Excel called Power Query that grabs tables from web pages. After learning more about it, it is easier than VBA. See an example at:

    https://bennyaustin.wordpress.com/20...erqueryimport/
    One test is worth a thousand opinions.
    Click the * Add Reputation below to say thanks.

  4. #4
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Get table from webpage

    Hi anrichards,
    Table have numbers. The default value is 0, that is table 0. If you have more than one tables, you need to specify the table number, like table(1)

  5. #5
    Forum Contributor
    Join Date
    11-04-2012
    Location
    Australia
    MS-Off Ver
    Excel 2016
    Posts
    334

    Re: Get table from webpage

    Marvin P,

    I looked at Power Query but the way the webpage is setup means that the table I want to extract could be anywhere on it and I have to iterate through thousands of webpages. I know enough VBA to get me started, hopefully with a bit more coding practice I will get more efficient at doing this kind of thing.


    Thanks Kalbasiatka and AB33 your code put me on the right track to what I require.
    Please Login or Register  to view this content.
    I have 2 issues;
    Issue 1. How do I exit the loop if it finds what I require before it gets to the end?
    Is it just:
    Please Login or Register  to view this content.
    I have tried this and it errors out because the loop is set to 1-25 but there is a different number of tables every time the webpage is generated. I get a runtime error 91.

    Issue 2:
    Is there a more efficient way of looping through the columns of a table than this?

    Please Login or Register  to view this content.
    I would like to parse some of the returned data because Excel invariably converts strings like 12-5 to a date and that is a pain to fix;
    For example:
    Please Login or Register  to view this content.

    Any suggestions?

  6. #6
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Get table from webpage

    I do not know why are you getting an error on Exit for. It is the correct syntax for exiting a For loop.
    I do not think it makes much difference when you loop through the DOM object. If you want to extract rows and columns, you need to loop through each of them. Yes, you can go straight in to each data with out looping That is, you still need to loop through cells ) using nodes and Childs, but if the table has small data, it is not worth the hassle finding nodes and Childs.
    Last edited by AB33; 07-31-2016 at 06:42 AM.

  7. #7
    Forum Guru Norie's Avatar
    Join Date
    02-02-2005
    Location
    Stirling, Scotland
    MS-Off Ver
    Microsoft Office 365
    Posts
    19,643

    Re: Get table from webpage

    Either use length to get the number of tables on the page or use a For Each loop.
    If posting code please use code tags, see here.

  8. #8
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Get table from webpage

    Hi anrichards22

    I have just been playing around getting at tables by different methods, both by “getting” by Table name and other “getting” ways.
    http://www.excelforum.com/excel-prog...te-change.html
    I had an original way and some members offered me other solutions which found the table I wanted by other Methods.

    Are you sure you cannot distinguish the table you want? -
    If you look carefully at the HTML source code you might find something distinguishing in the structure such that you do a ( pseudo ) code lines of this form ( noting what AB33 said about specifying table number, and Norie about the .Length property )

    Assuming you have your collection of tables form tbles= doc.getElementsByTagName("TABLE")

    __For n = 0 To tbles.Length - 1
    ____If tbles(n) .ParentNode.NextSibling.ChildNodes(2).className=”kjahfkjsgh” Then
    Or
    ____If tbles(n) .NextSibling.NextSibling.ChildNodes.Length = 4 Then

    Etc... etc....
    ____Else: End If

    _____Set tbl= tbles(n) .ParentNode.NextSibling.ChildNodes(2)
    _... etc etc....

    If you are sure you cannot identify the table you want by any other method then looks like you must loop through them all, and do a instr( type test for your text “Course:” , then simply use the current Loop ( or current + 1 Loop, or current + 2 Loop ) etc. etc.. count ( depending on if you are starting at 0, or 1 etc etc.. – you need to be a bit careful with the conventions there )
    I guess you could do that in the Loop then Exit For, ( which as AB 33 said should work OK ) or Exit For then work further on the Table you want.

    Pseudo code:
    __For n = 0 To tbles.Length - 1
    ____If instr( tbles(n).innertext__ , "Course:" Then
    _____Set tbl= tbles(n)
    or
    _____Set tbl= tbles(n + 1) etc.. etc...

    The .Length Property as Norie suggested will give you the actual number of things generally in DOM ( your doc ) stuff, so you can usually find the maximum to number to Loop, so no need to set a big end number and stop when you run out of things.

    If you want to loop , for example just specific columns****, then one way is to make a simple 1 Dimensional Array using the VBA Array Function of the indices ( Cells ( “columns” ) )
    https://msdn.microsoft.com/en-us/lib.../gg264844.aspx
    http://www.excelforum.com/excel-prog...ml#post4441280
    Then , pseudo code like
    cls = Array( 1, 2, 3, 4, 5, 12, 14, 15 )

    Then like kalbasiatka said

    ______ For Each rw in tbl.Rows
    ___________ For Each cl in cls

    ____________Do stuff ( rw, cl ) or Do stuff tbl.Rows(rw).Cells(cl)

    ___________ Next cl
    _____ Next rw

    Or an example of the .Length property use

    ______ For rw = 0 To tbl.Rows.Length – 1
    ___________ For Each cl in cls

    ____________Do stuff ( rw, cl ) or Do stuff tbl.Rows(rw).Cells(cl)

    ___________ Next cl
    _____ Next rw


    ****Note generally we talk about Cells listed in Rows in HTML stuff, rather than Rows and Columns – if you take a look at HTML source code it will typically look, pseudo like,

    This Table
    A Row
    cl 1
    cl 2
    cl 3

    A Next Row
    cl 1
    cl 2
    cl 3



    We would “see” that as two rows x 3 “columns”

    r1cl1 __r1cl2__ r1cl3

    r2cl1 __r2cl2__ r2cl2
    _........................

    _.......................................
    Most of what you want to do seems fairly straight forward. Maybe you have some simple syntax problems.

    If you can give more specific details, like a Web site, and example of the table stuff you want then someone may be able to help further.

    Alan
    Last edited by Doc.AElstein; 08-01-2016 at 12:27 PM.
    '_- Google first, like this _ site:ExcelForum.com Gamut
    Use Code Tags: Highlight code; click on the # icon above,
    Post screenshots COPYABLE to a Spredsheet; NOT IMAGES PLEASE
    http://www.excelforum.com/the-water-...ml#post4109080
    https://app.box.com/s/gjpa8mk8ko4vkwcke3ig2w8z2wkfvrtv
    http://excelmatters.com/excel-forums/ ( Scrolll down to bottom )

  9. #9
    Forum Contributor
    Join Date
    11-04-2012
    Location
    Australia
    MS-Off Ver
    Excel 2016
    Posts
    334

    Re: Get table from webpage

    Hi Alan,

    Thanks for your detailed post, it explained a lot of things for me.
    Unfortunately the website that I need to access is not publicly available, I was granted permission to data mine in order to collate some data for a research paper.
    The documentation for Internet controls and HTML objects is like hens teeth. I couldn't even find examples of code to get all the tables from a webpage. Eventually I cobbled together code that outputs each table (with tag,id, outertext,outerHTML) and from there I tried to find identifiable features of each table. I have searched high and low and taken bits and pieces from here, there and everywhere just to get the most basic code together.
    The code I ended up with has no trouble getting the tables IF they are actually there. (post #5)
    If the tables are NOT there then I ended up with object errors. Usually where the code was getting Table (p+1), which did not exist.
    I ended up with this code which doesn't throw an error and does the job.
    Please Login or Register  to view this content.
    I can see now that I don't really need the first loop so that will be the first thing I update.
    As for identifying the tables, it is bizarre because all the pre-learning I did before attempting my own code was with references to identifying factors (tag, id, innertext etc), however the genius that designed the output of the webpage I need to access placed no distinguishing features for nearly all the data, there aren't even table headers and each column of each table is formatted identically. Not sure if that was deliberate to stop web scraping or the quickest way. Given that the website can only be accessed by specific computers I suspect the design of each webpage was probably decided by the quickest and easiest option.
    Basically the tables consist of a list of students with columns of marks for different assessment tasks for different courses. Occasionally there are assessment tasks that are completed in groups, where this happens the webpage displays a "summary table" (no table name, no other identfying element) after a table with innertext Course:. So far I have only encountered these 2 tables as the last 2 on the page but I am informed that for some of the historical data the tables appear in the middle of the page, somewhere.

    I will do some experimenting and learning with some of the other info that you have provided because just a cursory read has given me an insight into where I can make my code more efficient. Thanks for sharing your knowledge, it is much appreciated.

  10. #10
    Forum Contributor
    Join Date
    11-04-2012
    Location
    Australia
    MS-Off Ver
    Excel 2016
    Posts
    334

    Re: Get table from webpage

    Norie, I was having trouble getting the length property to work. I kept getting a error 91. I fixed the problem by implementing the code as my post above, but it appears that the first loop in my code is not needed. Thanks for steering me in the right direction.

    AB33, the data is very basic, mostly numbers (1-5 digits) or text (50 characters max) and usually 12 columns wide. If I can work out which columns I don't need then it would be good if I could skip them. Thanks for your steering me in the right direction.

  11. #11
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Get table from webpage

    To avoid an error, if a table does not exist.

    If Not Elements Is Nothing Then

    Table tag is one of the element in HTML.
    Different people use different ways of identifying a table. For instance, lots of sites, use name, ID and class name.
    ID and class name are usually used as selectors in CSS. It all depends on the designer of the website. There is no standard how tables should be laid out or designed.

  12. #12
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Get table from webpage

    @ anrichards22
    Thanks for the Rep, glad we are helping

    Here is another Link I got recently for Retrieving Web Tables
    http://www.ozgrid.com/forum/showthread.php?t=188021

    It was given to me from one of the members that helped me a lot in the other couple of Threads where I learnt the little bit I know. .. his codes distinguish them from those others that I was looking at in that they get all tables...
    http://www.mrexcel.com/forum/excel-q...ml#post4026613
    _.........................


    The first time around all the HTML DOM stuff etc.. it did not quite sink in with me, unfortunately...... then the web page I was scrapping changed ARRGGGHHHH!!!!!!.. so I revisited it all here a year and a half later
    http://www.excelforum.com/excel-prog...te-change.html
    http://www.excelforum.com/showthread.php?t=1148621
    and it finally just sunk in.. but took me a while, and a lot of headache!!... Lol

    Good luck!
    Alan


    P.s. 1. I pretty well know to death my codes just now, if you have any questions on them..( but BTW I am not a Profi, this scrapping was my first ever HTML stuff, - When I do anything with this sort of code I feel like I am doing brain Surgery while reading instructions in Latin or Urdu. I got some great help here from some people who “spelt it out for me” in plain English )

    P.s.2 I got horrendous problems when scrapping with Excel changing Number formats. ( In particular decimal and Thousands separators getting mixed up ). In those codes of mine I
    a) Have a function that tries to overcome that – it basically splits numbers assuming the first separator is the decimal point, regardless of what it is ( comer or Point ) into the whole and fractional part-. Then it joins the Whole and fraction part with Maths before doing any pasting into the Worksheet. Up until now that has worked well.
    It was the Pasting directly into the Worksheet that often caused the annoying change of Format
    b) I convert all quantities into the same Units.

  13. #13
    Forum Guru Norie's Avatar
    Join Date
    02-02-2005
    Location
    Stirling, Scotland
    MS-Off Ver
    Microsoft Office 365
    Posts
    19,643

    Re: Get table from webpage

    I've got code that will grab all the tables, and their data, from a page.

    It's pretty basic but I can post it later if you want.

  14. #14
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    VBA Codes to Get tables from webpage. Scrapping Web Site Tables. Get Scrap internet Site

    Hi Norie
    Quote Originally Posted by Norie View Post
    I've got code that will grab all the tables, and their data, from a page.
    It's pretty basic but I can post it later if you want.
    I would certainly welcome it to add to my collection_....::..
    http://www.excelforum.com/showthread.php?t=1148621

    _.. I got really naffed off last week when a Web site changed and my Scrapping code did not work anymore_.......
    http://www.excelforum.com/excel-prog...te-change.html

    _..... so I would like to get as many different codes working so as to give me the best chance of getting one working again if that web site or others I am considering Scrapping change again

    _..... Also I find it a great way to learn seeing different solutions to the same problem..

    _ ... And this Thread Title is likely to lead a lot of people here with such a problem I expect.

    Alan

  15. #15
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Get table from webpage

    Hi Alan,
    No matter how complicated code you can come up, if an element has changed or removed from the DOM, you have the pleasure doing it all over again. I have many codes which work well over a year ago, but when I run them now, they do not work. When I look at the source view, the names and ID have changed, I then need to change my code too. The DOM in excel has limited properties and methods ,which mostly are copied from JavaScript-unless you are going to use some third party applications.

  16. #16
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Get table from webpage

    Hi AB33
    Yeah, I guess I just need to understand more and get better at writing codes more quickly.
    Maybe having an arsenal of codes might help a little bit.... - I think Albert Einstein was a messy bugger, storing notes and stuff as much as possible in a chaotic way. – He said he tried to remember as little as possible to keep his mind free to think.._.....
    _.....He was crap at Exams at school I think. ....
    _.............But I guess the similarity to Einstein ends there with me. Lol

    _..................
    The more “direct table” codes as I loosely call them,(_.... Codes like from Pike that i was looking at recently:
    http://www.excelforum.com/showthread...t=#post4445231 ....)
    are less effected to changes... but then they are less selective, so it is “Swings and Roundabouts” / “six of one and half a dozen of the other”..

    Alan
    Last edited by Doc.AElstein; 08-01-2016 at 12:45 PM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. get all row of paginated table from webpage
    By adbasanta in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 05-26-2016, 01:02 AM
  2. get the table from webpage to excel
    By mrkhchan in forum Excel Programming / VBA / Macros
    Replies: 13
    Last Post: 04-16-2015, 06:22 AM
  3. Extracting table from webpage
    By amber_skanpur in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 05-04-2014, 10:12 AM
  4. Get table data from webpage
    By nikanuka in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 04-17-2014, 10:57 AM
  5. Table Data (TD) on webpage and VBA
    By Jimbo77 in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 03-22-2014, 04:52 AM
  6. [SOLVED] Importing table from live webpage
    By Jimmyjazz in forum Excel Programming / VBA / Macros
    Replies: 33
    Last Post: 10-30-2012, 11:42 AM
  7. Copying table from webpage to worksheet
    By gerok in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 10-18-2005, 10:05 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1