+ Reply to Thread
Results 1 to 88 of 88

Help Modifying Scrape Code after Web Site Change.

  1. #1
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Help Modifying Scrape Code after Web Site Change.

    Help Modifying Scrape Code after Web Site Change.


    Hi
    Some time ago I developed a program I needed for an important personal project.
    This was only possible thanks to a lot of help from Leith Ross due to my very limited computer knowledge (I have some VBA novice knowledge, but zero HTML and Web Code knowledge !! ) .
    (_..... here the link to the original Thread:
    http://www.mrexcel.com/forum/excel-q...ml#post4031122
    _.
    ....)
    The Web site has changed. F__k. !!!
    The code don’t work now. I need help in making the necessary modifications in my code.

    Code Summary:
    The purpose of this code was to scrape the “sub parts” of a Web site such as this
    http://www.ernaehrung.de/lebensmitte...eischkaese.php
    The last part of such a URL is a variable, shown here in red:
    _.......ung.de/lebensmittel/de/W233000/Fleischkaese.php
    The red bit refers to a particular “food type”. ( here a thing that looks like a loaf of bread but is a very Fatty thing made up from ground ingredients like corned beef, pork, bacon and onions. Taste great but is very unhealthy, leading to the reason for my main personal project, .. Lol..!! ).

    So that bit in red changes. The code picks out which “food type”, based on selection in column G in a Worksheet ( Worksheet “FoodsLookUpTable” ) . Here a small part of that Worksheet, where here the URL to Fleischkaese is selected in preparation of a run of the code.
    Using Excel 2007 32 bit
    Row\Col
    A
    B
    C
    D
    E
    F
    G
    H
    8271
    Fleischbrühen (0) http://www.ernaehrung.de/lebensmittel/de/X411003/Fleischbruehen-%280%29.php
    8272
    Fleischkäse http://www.ernaehrung.de/lebensmittel/de/W233000/Fleischkaese.php
    8273
    Fleischkäse energiereduziert http://www.ernaehrung.de/lebensmittel/de/W237000/Fleischkaese-energiereduziert.php
    FoodsLookUpTable

    _.......
    A run of the main code following this selection “gets” ( or rather “got” before the Web site changed ), 7 tables of Nutritional information for the food product, and these are outputted into the first column in Worksheet “Leith2” , starting from row 21
    Code, Sub ScrapeDataLeith2DebiNetDeutsch(),
    Starts here:
    http://www.excelforum.com/showthread...t=#post4438353
    and the code is also given here in the supplied File: ( “NeuProAktuelleMakrosScrapeProblem2016.xlsm” )
    https://app.box.com/s/b8duyzaw2bmokbezv1kwrbhngfuq9th3
    _.......

    Here is the typical Output that my code gave me before the Web Site changed: ( It shows the first 3 tables complete, and the start of the forth )
    http://www.excelforum.com/showthread...=1#post4438289
    _......
    ( The seven tables I require are the one that is seen opened when you access a link such as
    http://www.ernaehrung.de/lebensmitte...eischkaese.php
    and in addition the other 6 which you have to click on to get opened )

    _........................................................

    Main Problem 1)
    The website has changed just now! The code does not now work!! It appears not to be finding the 7 tables now.
    I have looked at the HTML page ( I get this displayed if I right click when in the site when using Google Chrome and select something like show source text. ).
    I feel like I am following instruction written in Urdu whilst attempting to perform Brain Surgery on an Alien.
    I did notice that a “table reference thing”, previously I think, h2, may have now changed to h3.
    Dim Headers As Object: Set Headers = HTMLDoc.getElementsByTagName("h2") 'This object is a massive thing again with loads in, but this time it would appear to be the things "tagged" with <h2> </h2> which look like the headings of each table i am interested in except the last which Does not have table characteristics.......
    Replacing h2 with h3, ( in that above code snippet from my main Code), appears to get me a bit further.

    However I crash then at _.....
    If Headers(n).NextSibling.ID Like "container#" Then '..##...This object is made or refreshed each time in the loop, or it is made and deleted 7 times
    _..........With “run time error 91 object variable or with block not set”..

    I did make a .txt File of a shortened version of the HTML source text from the original Web site ( whilst I was trying to understand the thing last year...) here is a .txt File of it
    https://app.box.com/s/7unx9bgyi688dxxhsp8sydjqso46ua8u

    That is where I got the guess of changing h2 to h3.

    Looking further, I see I have in the old file, a “ div id="container1" “ in pointed brackets which seems not to be in the new HTML source code.
    I do see
    div class="table-responsive"
    And
    div class="table-responsive hidden"
    and I guess I may try some change based on this. .. But I am getting a bit nervous as I have no idea what I am doing...
    _... Can anyone do the appropriate changes and explain to me in layman terms, what is going on. ( I will keep trying myself, and follow up if I get there..... )



    _.......................................................

    Secondary problem 2)
    Background:
    If you have time, please view this:
    http://www.ernaehrung.de/lebensmittel/
    You will see that may enter a “food type”, such as my “Fleischkäse” ( note the German ä is required ) in the first Empty box, ( “Lebensmittelname:” ). You are then given a few suggestions, (_.... on clicking for example the second suggestion, Fleischkäse, your browser is sent to the URL example I gave above,
    http://www.ernaehrung.de/lebensmitte...eischkaese.php
    _.....)

    Previously, in the earlier version of the Web Site, the List of suggestions was a simple list of hyperlinks ( as shown in column A in the above screenshot. ( Column A in Worksheet “FoodsLookUpTable” in the uploaded File “NeuProAktuelleMakrosScrapeProblem2016.xlsm” ).
    Now to the point/problem
    I noticed a neat trick ! ( when using the previous version or the Web site ): If I entered nothing in that first window, but still Hit ENTER, then I got a very long list of all the URL’s. I then copied manually that list into column A of Worksheet “FoodsLookUpTable” in my File, “NeuProAktuelleMakrosScrapeProblem2016.xlsm” .
    The problem now with the new Web site is : Using the above “trick” I still get a full list of all Food products. But as you will see ( If you Hit ENTER with no entry in that first Box), it is no longer a list of hyperlinks. I have to click each suggestion ( of around 34539 !!! ) to get each URL in my Browser URL Box. Manually copying each one is of course a not preferred solution!!!
    I will try to do this myself. But it is really like attempting brain Surgery for me. Possibly someone familiar with Web scrapping can give me a simple code that might copy all these Hyperlinks, as Hyperlinks preferably, but alternatively as the simple Text URL which I have in column G )


    _.................
    Thanks for reading this and thanks in advance for any help
    Alan

    P.s. You will see that in the ‘Green comments, I have tried to understand as much as I can from this code. If anyone helping has the time to add any more, explaining what is going on, this would be a great help such that I may in future be able to sort out any problems myself...
    Last edited by Doc.AElstein; 07-21-2016 at 12:11 PM.
    '_- Google first, like this _ site:ExcelForum.com Gamut
    Use Code Tags: Highlight code; click on the # icon above,
    Post screenshots COPYABLE to a Spredsheet; NOT IMAGES PLEASE
    http://www.excelforum.com/the-water-...ml#post4109080
    https://app.box.com/s/gjpa8mk8ko4vkwcke3ig2w8z2wkfvrtv
    http://excelmatters.com/excel-forums/ ( Scrolll down to bottom )

  2. #2
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,164

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Alan,

    In Excel 2010 they put in a new tool called Power Query that is an Add-In and is now standard in 2016. Power Query is a pre loader for many different types of data. You can find and install the add-in on the net. In 2016 they renamed it to Get & Transform. It is located on the Data Tab and has the ability to Scrape Web pages. See some articles at:

    http://www.concurrency.com/blog/w/us...aping-any-webs
    https://www.youtube.com/watch?v=KrnRwyXZqNk or
    https://www.youtube.com/watch?v=ZJ30U0qw850 to show how to dig down into a php file.

    In the past we needed to write VBA based on what the page looked like exactly. Using Power Query you suck in the page and filter it based on what is in the cells.

    I'd suggest for your problem you take a few days and try to understand why Microsoft added Power Query to a great product. I believe it is for problems exactly like yours.
    Think of Power Query like "Advanced Filter" or CSE functions. It is simply a new tool built into Excel to pre process different types of data so they can be loaded into Excel to do analysis.
    Last edited by MarvinP; 07-21-2016 at 01:13 PM.
    One test is worth a thousand opinions.
    Click the * Add Reputation below to say thanks.

  3. #3
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    @ Marvin
    Thanks Marvin,
    I appreciate the reply and advice.
    You have already given me ( and others ) this advice to look at Power Query ( me already a few times now! ). It is obviously a fav of yours! And why not if it is a good things. Thanks for taking the time to spread the word about it.
    As I mentioned before. I will, as a result of your advice, when I have the time and the Excel version supporting it, take the time to look at this. ( As I also previously mentioned I do not have it available to me presently.)

    Never the less, I did indeed do form of “query thing”, ( to keep me open minded, and against me getting “stuck in my ways” ) as an alternative in a Thread answer I did..
    Rem 5) in code discussed here:
    http://www.excelforum.com/excel-prog...ml#post4321000
    http://www.excelforum.com/showthread...t=#post4320946
    I appreciate “query stuff” in general is very powerful. And as the name suggest “power query” must be the “Dogs bollox” ( very good , the best etc... )
    But that “query” I did, for example, I did with a macro recording. I had little idea after what was going on.

    Also regarding the current code I am using from Leith Ross, : I had another code version from Pike, which I am very grateful for, and I could probably still “fall back” on that now. One difference in Pike’s code compared to Leith’s is that it looked for tables rather than the specific name of the table etc....So was more versatile and may therefore still work with the modified Web page. ( Edit i checked and it does! )

    But:
    I prefer to be as “explicit” as possible, even if it does make a code a lot longer and more prone “not to work” when something changes, as now in this case”. I like to see as much as possible. I “feel” then more in control of what actually goes on. Just a personal preference
    ( And i have many enhancements in the current code which i would need to now add to the Pike code, which would take me extra time, and time is a bit short for me just now )

    But thanks again for the input. Appreciate the response. It encourages me to keep open minded ( and help get me “unstuck in my ways “.... But initially I will still get stuck in to getting my existing code to work...

    Alan

    Edit: P.s I am not sure of how a Power Query would help me in my final full requirement?

    At the end of the day, the "Leith" code i have is one of two which I Call

    The second code takes the results from the first and sorts by Nutritional value into a Daily Diet Protokol.

    So basicaly based on that very long list of food products in the FoodLookUpTable Worksheet , I select a Food type in column G of that Worksheet, run a code that calls both codes, and Bingo. - All the many detailed Nutritional values appear excactly where I want them.

    I guess the only use of Power Query here, would be if i had it, I could rum a macro recording whilst I got it to do the same as the "Leith" code. Possibly that code would then work if the Web site changes ? But I am nor sure about that?.

    I do not feel like doing a "power query" manually every time in place of my current codes??
    Last edited by Doc.AElstein; 07-21-2016 at 04:16 PM.

  4. #4
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    So I tried the simple changing of "container#" to "table-responsive#"
    Please Login or Register  to view this content.
    _..................

    But still same error at
    If Headers(n).NextSibling.ID Like "table-responsive#" Then

    _................................

    I wonder if the key to me solving this problem is understansing in as simple but concise terms as possible exactly what the original code did here,
    Please Login or Register  to view this content.
    In relation to this original source code like :

    debinetOldSource.JPG

    https://app.box.com/files/0/f/896989.../f_75254767798

    _........

    Maybe then I can understand how to modify the code to work with the following new source code like:

    debinetNewSource.JPG

    view-source:http://www.ernaehrung.de/lebensmitte...eischkaese.php

    _.......

    Still thinking
    Last edited by Doc.AElstein; 07-21-2016 at 03:48 PM.

  5. #5
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Alan,
    table-responsive is the class name of a div, not H2. H2 is just a header with no child.
    If this https://app.box.com/files/0/f/896989.../f_75254767798
    is the site you looking at, there is no even H2 with a table.
    You need to loop through a table or tables. If there are over 2 tables, you need to use table 0 or 1 and loop through table.

  6. #6
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi AB33,
    Quote Originally Posted by AB33 View Post
    ......
    You need to loop through a table or tables. If there are over 2 tables, you need to use table 0 or 1 and loop through table.
    Thanks,
    I think that roughly ties up with the alternative code version that I have from Pike, as this does look for "tables". But I would like to modify the Leith code I have

    _......
    One of the problems I face is that this sort of thing:
    Quote Originally Posted by AB33 View Post
    table-responsive is the class name of a div, not H2. H2 is just a header with no child........
    is a foreign language to me. I have no idea what that means. Six months ago I spent days googling to understand all that, but just went mad!!

    Thanks anyway, for the response. Appreciate the effort.
    Alan

    _..............
    P.s.
    Quote Originally Posted by AB33 View Post
    ....
    is the site you looking at, there is no even H2 with a table.
    .....
    I know I have no h2. As I mentioned I replaced h2 with h3 and that did seem to get me a bit further.

    Thanks again.

  7. #7
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Some people use class name to identify the table instead of table no. If there is a single table, the default value is table 0 and you do not need to be explicit.
    <div class="table-responsive">
    div is just a container and does not do much. You use div as block, usually with CSS. You are telling the CSS, using an identification of class name table-responsive, I want to apply my style to that particular class. The table is with in the block of a div and table-responsive is a way of identifying that div as CSS could have many divs.
    Whether you use class name or table no, in order to loop through a tag, the tag name must have children. In this case you have rows and columns
    I am also learning the web languages. These languages are not difficult to understand with in the web platform. I think MS uses JavaScript language, such as get elements by ID, name and so on. JavaScript language objects do not directly translate in to VBA and that is where I need to learn more.
    I hope this helps.
    Last edited by AB33; 07-21-2016 at 04:47 PM.

  8. #8
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    @ AB33
    Thanks ,
    I will take a good look at that and see if I can understand it
    Alan

  9. #9
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,164

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Alan,

    I thought you could install it as an Add-In for your PC version of Excel 2010. I wonder if it is only available in the USA??

    After reading more of this thread it looks like knowing HTTP/Web programming would help. That topic isn't near to me, so I'll need to study more also to understand the answers AB33 is suggesting.

  10. #10
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    You'll need to add a reference to the "Microsoft HTML Object Library". Late binding won't work with this, you'll also need IE9 or greater installed

    Problem 1.

    This replicates the output, I can't read your comment riddled code so I haven't attempted to do the post processing:
    Please Login or Register  to view this content.
    Problem 2.

    This may take a little while since the resulting web page is over 5MB!
    Please Login or Register  to view this content.
    Last edited by Kyle123; 07-22-2016 at 06:03 AM.

  11. #11
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    @AB33
    Quote Originally Posted by AB33 View Post
    Some people use class name to identify the table instead of table no. If there is a single table, the default value is table 0 and you do not need to be explicit.
    <div class="table-responsive">
    div is just a container and does not do much. You use div as block, usually with CSS. You are telling the CSS, using an identification of class name table-responsive, I want to apply my style to that particular class. The table is with in the block of a div and table-responsive is a way of identifying that div as CSS could have many divs.
    Whether you use class name or table no, in order to loop through a tag, the tag name must have children. In this case you have rows and columns
    ...
    ..

    Hi AB33
    I read your Last post here ( #7 ) a few times now. And dreamt it a few times !! - Such that I know it Parrot fashion. But unfortunately, This is about all I can understand

    _1 ) “div”
    So div is like a container, box etc. Something that has something in it. I can understand that.
    _2 ) class="table-responsive“
    I know from real life and VBA/Object Orientated Programming that a Class is like a Blueprint, or a questionnaire not filled in yet. A Car say from a particular manufacturer from a particular series in its most basic form but not quite finished yet. For example but not yet painted, not yet given the bits that distinguish it from all other cars from this series or “class”. So a basic similar Format from which an actual version or “an instance” will be created, from which it will be Set up initially then various properties can be added or changed. Eventually we have then an actual car. Or in this case a Table of Nutritional values for an Actual Food Product.
    _2 (i) So Identifying the details for similar formatted tables using a word like
    class="table-responsive" is as good as any I suppose way of identifying the sort or Table or class of Object I have in my „div“ container.
    I am afraid the rest of what you wrote makes no sense whatsoever. Like most things when you know it is obvious. If I read it again in a year or two it will appear obvious to me maybe, and seeing it for the first time in an actual context will help it maybe sink in later. So thanks again.
    (_...._2(ii) i also have class="table-responsive hidden" so I was hoping somehow a simple mod as I noted Post #4 "container#" to "table-responsive#" might get me there. Clearly there is a lot more to it and I am lacking the knowledge ) _..)

    Thanks again for the input. I will keep at it. Currently I am trying to revamp the Pike code alternative I have,
    http://www.excelforum.com/showthread...12#post4439012
    http://www.excelforum.com/showthread...21#post4439014
    which uses something like get Elements By Tag Name("table"). This seems to return a similar object to that I seem to get by doing the h3 for h2 change in Leith’s code. But Pike’s code seems to go on to work to get me my final output still. It probably is sensible to have a full working back up anyway. But I would still like to get the Leith code up and running again also.
    Alan

  12. #12
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change. Thanks Kyle :)

    @ Kyle.
    Hi Kyle,
    Thanks so much for the surprise alternative, ... and extra new code for Problem 2!!!
    Problem 1)
    At first glance there may be some similarities in the First code with a Pike code alternative I am re vamping just now...
    http://www.excelforum.com/showthread...12#post4439012
    http://www.excelforum.com/showthread...21#post4439014
    you “get” stuff by its name, thing, sort of.....
    _.. “get Elements By Class Name ("center-block") “ ..( but I see no "center-block" in the source code I see in Google Chrome ?? )

    ( I just about managed to understand the Pike Code back then, and am trying to get clued up on that again.. )
    I will try to also go through this code of yours, , and try and understand it, and let you know how I get on. ( Could take me a while to understand it.. lol.. ! )
    I guess the similarity to Pike’s code is you are getting a thing by its name somehow. Leith’s code somehow delved more explicitly into things I think..

    ( The post processing gives me an extra column with unified Units, ( g ). It is not as easy as it sound as Excel is a nightmare randomly changing the comer and point separators )





    _................................................................
    Problem 2)
    Thanks so much for that.
    I will let you know how I get on with that.

    Thanks once again for the reply. Much appreciated!
    Alan

  13. #13
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    You cant go any more explicitly since none of the elements have IDs like they did in Leith's code.

    An ID is unique per page, classes aren't.

    You posted a picture further up that has the center-block class in it

  14. #14
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Kyle,
    Quote Originally Posted by Kyle123 View Post
    You cant go any more explicitly since none of the elements have IDs like they did in Leith's code.
    An ID is unique per page, classes aren't...
    Does that mean Leith's code may not be possible to adapt to the new site?

    Quote Originally Posted by Kyle123 View Post
    ....
    You posted a picture further up that has the center-block class in it
    I saw that... but it was part of a longer word..
    "panel panel-default center-block"


    does that not matter?

    Thanks
    Alan

    EDIT Clearly it doesn't matter - the code works great !!
    Last edited by Doc.AElstein; 07-22-2016 at 06:38 AM.

  15. #15
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Alan,
    Ah, it is funny!
    The class I was referring is not the class in OOP. It is CSS-cascading sheet style- it is one of the lingo in web development.
    It is main purpose is to just to style- to change the presentation of the web page-Layout, colour and so on.
    CSS is not a programming language, but has it's own rules on how to apply the style to a page. We use ID, class and tag names to apply the rules. Class and ID are the selectors. We use selectors with property name and value of the property.
    To scape data from a web, you need to locate ID, Tag names and class names.
    I hope Kyle will correct or add some beef to my comment as my knowledge on web development is very rudimentary

  16. #16
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Doc.AElstein View Post
    Hi Kyle,

    Does that mean Leith's code may not be possible to adapt to the new site?

    I saw that... but it was part of a longer word..
    "panel panel-default center-block"


    does that not matter?

    Thanks
    Alan

    EDIT Clearly it doesn't matter - the code works great !!
    No it doesn't, it just means that the element has multiple classes, you can select by any of them.

    Having read Leith's code, I don't think it's very explicit at all, no more so than mine at any rate.

  17. #17
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Kyle, thanks again
    _1 ) So I guess the convention is that the classes are separated by a space?
    _..............
    _2 a) It is very difficult for me to understand the codes, but are you saying, .. Your and Leith’s codes are similar. They are looking for something specific which will be then likely sensitive to a Web site change._..........
    _2b) Pikes code is the more “robust” , that is to say it worked for both sites as it went for getting Tables, which would be a good thing to go for as < table > will always likely be there – for the Tables !
    _ 2c) Then again some of your code is very similar to Pike’s

    _.................
    _3) What would your opinion be the pros and cons of the 3 Codes ?
    _(i) Pikes
    http://www.excelforum.com/showthread...12#post4439012
    http://www.excelforum.com/showthread...21#post4439014
    Desiccated version:
    http://www.excelforum.com/showthread...t=#post4439339


    _(ii) Yours
    ( desecrated version, Lol.....
    http://www.excelforum.com/showthread...t=#post4439349
    http://www.excelforum.com/showthread...t=#post4439357
    _..)

    And
    _(iii) Leith’s ( If modified to work with the new Web Site )


    _......................


    Your Sub GetLinks() works great, ... and a lot quicker than it took me to copy and paste the list manually. !!!
    _ I am not looking forward to trying to understand that either. ( from and including ReDim data(1 To linkItems.Length) I understand )
    _ you could probably save me a week of brain ache if you had time to explain your codes a bit, but I probably deserve the punishment

    _ The main thing is you have given me some great working codes that would have taken me a year to do myself. I am really grateful.

    Thanks so much !

    Alan


    P.s. Sorry about all my annoying 'Green comments. - I cannot live without them, especially when I have to come back to a code sometime later
    Last edited by Doc.AElstein; 07-22-2016 at 02:05 PM.

  18. #18
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Ok, here goes:


    Please Login or Register  to view this content.
    Create an object that does the web call, this is a POST request (there are different types of requests, typically (but not exclusively), GET, POST, PUT, DELETE, PATCH) so we need to tell the object which one to use and the url to send the data to.

    POST requests send the data in the body of the request (that's why the parameters are passed to the Send method), so get this and send it to the web page.

    Next check the status of the HTTP request, 200 is all ok, however it's not strictly required here - it's possible that a 200 code is returned even if the request fails.

    Next get the returned html and stuff it into a HTML document, this allows us to use DOM (Document Object Model) methods to interact with it. Think of HTML like a class, it needs parsing and instantiating, the HTML Document object does this.
    Please Login or Register  to view this content.
    Get all the elements in the page that have a class of list-group-item
    Please Login or Register  to view this content.
    Loop through all the matched items and read the href property of the object we have (in this case an an anchor (hyperlink) object) and stuff it into an array (for faster writing to the sheet)

    Finally we then simply write the array to the sheet

  19. #19
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Please Login or Register  to view this content.
    As above, but since this is a GET request, there's no data to send in the body of the request
    Please Login or Register  to view this content.
    Get all the elements with a class of center-block (all the tables that we are interested in are contained in a div with the class center-block). We can't just get the tables since the heading for the table is outside the table.
    Now we have a collection of div objects that contain the required data, we now simply loop through them and get the information that's required:
    Please Login or Register  to view this content.
    This takes the div object and returns the first h3 tag (heading 3) that is contained within. We then simply read the innnerText property of the returned object (the table heading) and write it to the worksheet.
    Now we need to get the table data:
    Please Login or Register  to view this content.
    Get the first table within the div object that has the class table, now we have this, it's just a case of looping through the rows and columns of the table.

    HTML tables aren't like cells on a spreadsheet, the table column cells are contained within a row object and a table is made up of rows (amongst other things)
    Please Login or Register  to view this content.
    Loop through each row (contained within the rows collection of the table), then loop through the cells collection (columns) of the row and write it to the worksheet.

  20. #20
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Thanks Kyle.

    _ - I can see some similarities from what I have learnt from Leith’s and Pike’s codes. ( And the small bit I figured out myself – The HTMLCell Object having Cells in its “rows” rather than ”columns” was about as much as I got until now .. Lol.. ) . This info really helps me to get these things straight in my ( small ) head.

    I do Google, but just cannot find this sort of info.
    I guess I do not know where to look. ( I just did my head in trying to get a clear idea about a “Child" and a "Sibling" in my endeavour to modify Leith’s code to work... but I cannot find that info, or cannot understand what I find.
    So instead, just now, _..... http://www.excelforum.com/showthread...t=#post4440149
    _.......I am going over his Sub GetElemText and then for the masochistic “fun” of it I will use that in your and Pike’s codes ( It is an explicit alternative to .innerText
    , - I like to be explicit! – and so does the wife.. )

    I can follow these clear explanations of yours. – thanks ! – worth a thousand “Googles”

    I am trying very hard just now to totally ruin all your codes with ‘Green Comment Graffiti, and this extra info will help me further in my endeavours to that end.
    Thanks very much. Appreciate it

    Alan

  21. #21
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    _.......I am going over his Sub GetElemText and then for the masochistic “fun” of it I will use that in your and Pike’s codes ( It is an explicit alternative to .innerText , - I like to be explicit! – and so does the wife.. )
    No it's not How can it be more explicit than calling the property?

  22. #22
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    No it's not How can it be more explicit than calling the property?
    Over in that test Forum Thread, I mucked up a version of the original source code and included a table ( one "row" with three Cells ( within a header Cell ) ).
    My guess was that going for a table type code like Pike's might give me in that case an extra Table I do not want, but i had not got around to test that yet. Just now i am getting the recursion idea finallly clear in my head...


  23. #23
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Yes, there are more tables than oyu want on the page, that's why I filtered using center-block

  24. #24
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    You've completely lost me, how is mine not a table type code?


    I am probably loosing myself! Lol

    I need to go through your code again now with all the new info you gave me.... and that may take me a while...
    but basically I thought Pike's "short cuts" a bit missing out a bit at the start at each
    ‘=======table main loop
    that yours and Leith’s does to pick out the table within that big "DispHTMLElementCollection" that you and Leith have.
    He does not have that, ( I think **) “getting” the Table by table tag name , sort of. **- Actually he does.. BUT:

    It is very difficult for me to explain here quickly what I am looking at in detail over in that over Thread.
    I think the point is:
    You and Leith have a "DispHTMLElementCollection" that has the table I want in each of the “items” ( 7 or 8 I think) in that "DispHTMLElementCollection" collection thing.
    Pike’s "DispHTMLElementCollection" thing is a collection of tables. Hence he skips a bit you and Leith do at the start of each_.....
    ‘=======table main loop
    _.....to make sure you are getting at the Table Object within your "DispHTMLElementCollection"

    Pike’s "DispHTMLElementCollection" is a collection of ( only ) the tables. Yours and Leith’s "DispHTMLElementCollection" ‘s have other stuff in also


    I used the phrase “ not a table type code” loosely. Sorry, that was probably incorrect. - I know that in end effect you all loop through each table object. Pike is just a bit less selective. You and Leith sort of get that table indirectly.

    You will not get the point of what I am trying to say from my quick replies here. You would need to look carefully at my detailed ramblings over at the Test Thread I am babbling in just now.
    But I can’t say I recommend it, I confess I do not look forward to reading it myself... lol

    I am just trying to get as much clear understanding so I do not get caught out if the Web site changes again. ( And I am a sucker for wanting to understand any code I use, which I think is healthy )


    Thanks Kyle.
    Last edited by Doc.AElstein; 07-25-2016 at 11:13 AM.

  25. #25
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    Yes, there are more tables than oyu want on the page, that's why I filtered using center-block
    I think that is part of what i was trying to say, - yes you filter them, - so does Leith... I think Pike does not

    may be
    Last edited by Doc.AElstein; 07-25-2016 at 10:22 AM.

  26. #26
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    Yes, there are more tables than oyu want on the page, that's why I filtered using center-block
    In the original source code I had to Loop one less than the tables found by Pike's code when using Pike's code. ( The last table was something i did not want )
    With the new web site I did not need to do that.
    But that is just a by the way comment. I am just trying to get as good an understanding and make my code as bullet proof as possible, or , thanks to you i have a few alternatives, so if one stops working i can fall back on the other as a quick fix before , as in this case i look in more detail later ( and litter the Test Forum with my 'ramblings lol..

    Thanks again
    Last edited by Doc.AElstein; 07-25-2016 at 12:20 PM.

  27. #27
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Help Understanding Scrape Code..... Hello Kyle... :)

    @Kyle, ( or anyone else that can help me get a few things clear here )
    Some follow up question if you have time......
    Your Problem 1 Code, Post #10
    http://www.excelforum.com/excel-prog...ml#post4439005

    I am a bit confused with
    Quote Originally Posted by Kyle123 View Post
    You'll need to add a reference to the "Microsoft HTML Object Library". Late binding won't work with this, .......
    _..You do not appear to be using your variable request
    ( Dim request As Object )
    _......

    I think, / thought these were the basic forms of Late and Early Binding here
    Please Login or Register  to view this content.
    _....................
    Questions:
    _1a ) Is it that Your_...
    Please Login or Register  to view this content.
    _.... is doing Late Early binding for the “request stuff”_..
    .Open
    .Send

    _....
    But
    _1b)
    You need Early Binding for all this:
    'Dom
    Dim dom As HTMLDocument
    Dim div As HTMLDivElement


    Dim table As HTMLTable
    Dim rw As HTMLTableRow
    Dim cl As HTMLTableCell


    ( It is not too easy for me to check this last bit, as often VBA does not let me take off my reference for Microsoft HTML Object Library once I turn it on !! )

    ( BTW, Leith’s code uses late binding the HTML document, and his code will not work in Early Binding for that )***




    _2a) Regarding
    .Open
    And
    .Send
    Is . Open sort of preparing the Path, telling who or whatever that a particular sort of requests coming, and or checking that that “path” exists is valid, will respond if asked etc.. etc...
    And
    .Send actually then does “send” the “request”?
    _.....

    _2b) with no argument in, is .Send just like me typing my URL in the address bar in my browser. In which case a .HTML or .txt File comes back. My browser would do all sorts of stuff including showing me the Web site and spying on me etc.. etc..
    But in these sort of codes,
    a code line like_...
    Dim PageSource As String:Let PageSource = request.responseText
    _.. will just give me just a HTML code like in a simple text form

    _....
    Then there are various code lines to pass that text to a DOM thingy.
    _2c) Is it that You sort of “short cut” 2b) by doing a code line like
    dom.body.innerHTML = .responseText

    (To ellaberate: I think this last point is similar to the way Pike’s code does it. I believe Leith’s is slightly different : Leith is using the .Write Method to convert the simple text of the page to a DOM thingy. I believe Leith’s way may be somewhat more explicit here, but please do not ask me what I mean as I have no idea what I may be talking about on this last point .. Lol.. ( I believe however, this last point may be a reason why Leith’s code requires late binding_...[color=grey]
    Dim HTMLdoc As Object: Set HTMLdoc = CreateObject("htmlfile")
    _..... for his DOM thingy.. but I am stretching my understanding a bit here.. Lol ) )

    Thanks Kyle

    Alan
    Last edited by Doc.AElstein; 07-28-2016 at 10:50 AM.

  28. #28
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    1. Early binding in this case refers to the HTMLDocument, not the request.
    Please Login or Register  to view this content.
    and
    Please Login or Register  to view this content.
    Are the same thing, the former uses late binding and the latter uses early (this isn't strictly accurate, but good enough here)
    The reason that my code will not work with late binding is that late binding the htmldocument/file returns a different interface than early binding it. This is somewhat of an oddity, most objects will work either way.
    Specifically GetElementsByClassName is not available with late binding, but is with early binding.

    1b. Yes
    2a. Yes, close enough
    2b. Yes (though typing into your address bar will always be a GET, you can't use any other verb by typing it into your address bar)
    2c. There's no shortcutting.

    When you (using the xml request) or the browser makes the request, it retrieves exactly the same thing, a text (usually, but this is set by the mime type of the response) template of the web page.

    The browser parses the text, and builds the page (forget about JavaScript and CSS for now, they're not helpful in explaining the point). This built page is referred to as the document and one interacts with it using the Document Object Model.

    The HTMLDocument/File that is created in the code above does exactly the same thing, but is a lighter version of doing it in the browser (it just builds the page - without any of the other stuff, JavaScript execution, CSS, fetching other dependencies etc..) that's why it's so much faster than automating internet explorer. Think of it as a browser that you can't see.

    There's bugger all difference whether one uses write or assigns to the innerHTML property for this application - neither is more explicit than the other, in fact, I suspect that writing to the innerHTML is more efficient

  29. #29
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    I think the Penny is slowly beginning to drop. That will relieve a few people maybe, or at

    I think the Penny is slowly beginning to drop. That will relieve a few people maybe, or at least me


    Hi Kyle,
    Thanks very much for that. Amazingly ( probably thanks to your help , ( and the original Thread from Leith that I have read so many times in the past week that I dream about it ) ), I think I am really beginning to understand all this now...


    _.....

    On the last point, .. I thought may be from the original Leith Posts_.....
    http://www.mrexcel.com/forum/excel-q...ml#post4031122
    _..... the .Write Method was somehow “designed” to convert the page HTML code text ( or HTML Code in a .txt file or .HTML file ) into the DOM Object thingy. I think I got the point that
    .innerHTML
    Or
    .body.innerHTML
    Does the same, ( possibly exactly the same ?? ) but it is just as ... “the .Write Method was somehow “designed””.. it was more “explicit” – Maybe “explicit” was the wrong word – maybe I just meant... ..” “proper”, sort of “..
    And again, the point that this only works in late Binding ( I mean Leith's ) ( I think ) was suggesting to me that it works differently. Possibly it does work differently, but you are telling me the difference makes no difference or even the .InnerText might be a bit better?

    At one point I may have thought that the .Write Method also got something else. This is because Leith inferred that his code got some extra stuff in his DOM Model that Pike did not. **
    **
    I have a feeling probably that in this case yours and Leith’s Code are similar have some extra stuff the Pike’s does not, but that is not to do with the .InnerText or .Write,..... Rather it is to do with the point I tried to get across in my Post # 24
    http://www.excelforum.com/showthread...58#post4440737


    _......
    Thanks again ...
    Alan


    EDIT: Thinking again
    **Leith Ross: http://www.mrexcel.com/forum/excel-q...ml#post4031122
    Pike uses only the Body of the Page Source code when converting it to an HTML DOM document. This excludes the Meta data, Java scripts, and Class information from being converted. Generally speaking, this information is not used when retrieving only text data from a web page.

    You are correct that my code does convert all the code into a complete HTML DOM document. There is nothing wrong with Pike's method of converting the page source into an HTML Dom document. I simply prefer to the Write method of an HTML file. Since this method is designed to convert the page source text into an HTML DOM document. Both methods achieve the same results.
    Last edited by Doc.AElstein; 07-29-2016 at 04:11 AM. Reason: ** Thinking again **

  30. #30
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    It makes no difference, it only matters when you're building a page for the web and even then using .write is heavily discouraged:

    http://stackoverflow.com/questions/8...a-bad-practice

    For parsing in Excel, forget about it, it's no different, it's no more explicit and it doesn't matter

    The only difference between Mine, Pikes and Lieth's code is how the elements are selected, this happens after parsing the dom (using .write or innerHTML).

    Pike only rips the tables from the page, my code gets the elements that wrapt the tables so the headers (that aren't in the tables) could be read. It's possible to do this via Pike's method by accessing the parent node of the table, but it isn't as efficient.

    I find Lieth's code somewhat overly complex for the task in hand, I can see why he does it, but the recursion is overkill
    Last edited by Kyle123; 07-28-2016 at 12:41 PM.

  31. #31
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi
    Quote Originally Posted by Kyle123 View Post
    ......
    The only difference between Mine, Pikes and Lieth's code is how the elements are selected, this happens after parsing the dom (using .write or innerHTML)....
    Pike only rips the tables from the page, my code gets the elements that wrapt the tables so the headers (that aren't in the tables) could be read. It's possible to do this via Pike's method by accessing the parent node of the table, but it isn't as efficient.
    That was what i meant by the table stuff, but i did not quite explain what I meant too well, - again all coming back to my Post # 24
    http://www.excelforum.com/showthread...58#post4440737

    _..................................

    Quote Originally Posted by Kyle123 View Post
    ..... but the recursion is overkill
    Oh no !!,.. - after a 3 days of intense headache, I finally managed to ( almost 100 % ) understand that recursion code..
    http://www.excelforum.com/showthread...t=#post4443309
    ( The only snag was a node being read twice as something different - ?? – but that was probably me not writing the HTML modification I did properly.. ? )
    and later I am going to apply it to yours and Pikes Codes. ( I know I shouldn’t Lol.. )


    After that I will see if I can finally get the original Leith Code to work with the new Web Site.

    Then I think the Thread will be well Finished, ( and me too probably .....)

    Thanks again for all the input.
    Alan
    Last edited by Doc.AElstein; 07-28-2016 at 01:12 PM.

  32. #32
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    @ Kyle, ( or anyone following the Thread )

    This Post is just adding a bit to the solution, or rather clarifying a bit a possible misunderstanding in then Question/ answer discussions... ( the “table code misunderstanding “ )

    In this Thread
    _....I have rewritten now codes from you( Kyle ) Pike and Leith in more messed up versions than I can remember, but the basic structure is the same, sections, (Rem’s as I call them like) :

    Rem 1 Rem 2, not important here - Rem 1) Just Worksheet info, and – Rem 2) does all the business to get you the main Document Object Model, DOM..... .....die DOM DOM

    A quick Résumé Méé make do here of Rem 3 and Rem 4 will help clear up the "table Point".. i think
    totallly confuse the situation

    Rem 3) These sections in all Three Codes, Leith, Pike, and Kyle getElements by a big "DispHTMLElementCollection" thing.
    Speaking generally now in codes like mine ( but not specifically my current example necessarily ), one can say this big thing is selected such that it contains a number of items , there total number ( returned by a .Length Property ) being equal to or possibly a bit greater than the Number of tables I want. Generally the table data will be in there somewhere....

    Pike’s "DispHTMLElementCollection" thing, objTables, is a collection of all Tables ( The Items in this thing are tables ) . This is the Point that Pike’s is a simple “direct Table” Code, but note I am using that term “direct Table” a bit loosely .
    The main Loop
    Rem 4 ) loops “directly” all Tables. It applies the various HTML Table Properies ( Rows, Cells etc.. ) to pseudo
    objTables( x ) ( In the previous web site this gave me an extra table I did not want at the end, so I just looped 1 less than the DispHTMLElementCollection length )
    _......

    Leith’s DispHTMLElementCollection is a collection of HTMLHeaderElements (The Items in this thing are tables “wrapped” in side HeaderElements. )

    The code in each main Loop
    Rem 4 ) , dribbles down Siblings and Childs to get each table, a single HTML Table , puts that in a Object variable and applies the various HTML Table Properties to that variable, reusing that variable each time in the Loop. Part of the Sibling dribble checked for the Like correct ID= ied Container so was a bit selective and did not catch in the previous web site the last unwanted last table, for excample..

    Kyle’s first "DispHTMLElementCollection" thing, divs, is diverse stuff, the Items are HTMLdivElements ( I refer to it in the code by _..'KDisp1 )

    Then in the main Loop
    Rem 4 ) two further big ( but only one item in them ) "DispHTMLElementCollection" ‘s are gotElements by....._..
    _..KDisp2 Dim Header As Object : Set Header = div.getElementsByTagName("h3")
    This gives only 1 item a HTMLHeaderElement. You ( Kyle ) did this for me to get a header. ( from accessing the first ( only) item in this DispHTMLElementCollection, 'KDisp2 and the innerText thereof:
    = div.getElementsByTagName("h3")(0).innerText

    Then the next 'KDisp_.........
    _.. ( 'KDisp3 Dim objTables As Object: Set objTables = div.getElementsByClassName("table") 'Probably this the bit Kyle means need Early Binding----- Kyle: ...."Specifically GetElementsByClassName is not available with late binding, but is with early binding....." )
    _................is again a one item DispHTMLElementCollection so you assign that by its first ( only) item, to an Object variable and apply the various HTML Table Properties to that variable, reusing that variable each time in the Loop, as did Leith.
    KDisp3: Dim objTables As Object: Set objTables = div.getElementsByClassName("table") 'Probably this the bit Kyle means need Early Binding-----"Specifically GetElementsByClassName is not available with late binding, but is with early binding....."
    Set table = objTables(0) 'using a real table now 'KDisp3


    _....
    So..
    Quote Originally Posted by Kyle123 View Post
    You've completely lost me, how is mine not a table type code?
    Hope we are coming together, well not literally, but I mean hopefully you know what I mean, Meant – When you are at the Expert Level you understand these things, but only just.. Lol. : )

    So further every Cell is looked at .....
    http://www.excelforum.com/showthread...=2#post4440149
    http://www.excelforum.com/showthread...=4#post4444735

    Pseudo for all codes In the middle of each Inner Loop for
    _a) each HTML Table Cell the line getting the .innerText -........( Kyle, Pike )
    ’---Inner loop does at each row, ....
    __For each Row, r
    ’'--- .... 'go through each Cell( "column" ) in that row. ....
    ____For each Cell, c
    _____TableCellValueWeWant=HTMLTableCell.innerText
    Or for the case of filling an output Array of values for the whole Table we want
    _____Data(r, c) = HTMLTableCell.innerText

    Or
    _ b)
    Using the recursion procedure ( Leith ) (here just the case of filling an output Array of values for the whole Table we want )
    ’---Inner loop does at each row, ....
    __For each Row, r
    ’'--- .... 'go through each Cell( "column" ) in that row. ....
    ____For each Cell, c
    _____Call Sub GetTheTextBackToYou ( HTMLTableCell, Data(r, c) ) ' (¯`*¨*OVERKILL*¨*´¯)¸.' Dont Sweat It .. This procedure will Get The Text Back To You
    Or
    _____ Call Sub GetTheTextBackToYou ( objectHTMLTable.Rows(r).Cells(c) , Data(r, c) ) ) ' (¯`*¨*OVERKILL*¨*´¯)¸.' Dont Sweat It .. This procedure will Get The Text Back To You

    _....
    So you were right they are all “ Table “ codes -- I should of said Pike’s differed a bit being a more “directer “ Table Code.

    _

    Please feel free to comment on any of that, but I think I have it sussed

    _.. a few questions in the next post...

  33. #33
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Some Follow up questions for Kyle

    @Kyle
    Hi Kyle :-)

    _2)
    Your “xml request”...

    I mentioned in Post #27 that you are not using your
    Dim request As Object ‘ I ‘commented it out and have seen no ill effects yet..
    _.. but your further answers (and my current “Expert Status” Lol.. )..
    _.... suggests that.
    ___With CreateObject("msxml2.xmlhttp")
    Is an alternative for the typical Late Binding pair of
    Dim Request As Object: Set Request = CreateObject("MSXML2.XMLHTTP")

    Now, I note often such a pair will sometime be later in the code be accompanied with a
    Set request = Nothing 'Free memory and resources used by this object.

    My Questions
    _2a)
    Does
    _2a(i): the End With effectively do ( pseudo code ) Set the “xml request” = Nothing
    Or
    _2a(ii): Are you going to say Set ____ = Nothing is a load of old bollox not relevant anymore in VBA as it is all taken care of when the code Ends

    2b) You have already explained that you had to Early Bind for the htmldocument/File _.....Interesting in passing here Leith had to do exactly the opposite and use Late binding -...Leith:- ..... “ ... The htmlfile is an ActiveX object that is a wrapper function for the IHTMLDocument2 interface in MSXML2. This gets into a lot of low level system operation.......".......
    _.........Early binding would have been fine here, I think:, like
    Dim request As MSXML2.XMLHTTP: Set request = New MSXML2.XMLHTTP '
    _.. but you probably did not do that as you also need then an additional reference to Microsoft XML, v6.0
    2b) _ Have I got all that approximately correct?
    ( VBA Geek: http://www.mrexcel.com/forum/excel-q...ttp-macro.html )

    3) is it possible to very briefly ( simply ) say what the Microsoft XML, v6.0 Library is. In some of my Demo Codes instead of a URL I use the full Path and name string to a text file with a HTML Code in I wrote and the “xml request” still “works”... So is that library just a lot of Methods and Properties saying how to handle HTML XML codes and the like. .. - .responseText just gives a long string of the code.
    ( I saw that .responseBody gave me something that maybe EF Management could understand ( Urdu, or similar language. Or something certainly way off of English ) )

    Thanks :-)

    _................................................................
    I think I see very clearly now, that it is at this point that yours and Leith’s code differ. ( The .Write versus .innerText Stuff ) . - The Intellectual discussions on that our out of my depth, but I can glean a bit from what you and the Link you gave me are saying. So good to bear in mind. ( _...That is exactly the stuff i like to link and put in my ‘Comments for later reference ) Thanks again there Alan
    _..........................

    _4) Just a quick one on your Problem 2 Code ( part of it, or most of it I understand ....
    BTW – you explained that great, I understood most of that, - Brilliant Thanks again...and with your follow up answers, I think I have it fairly clear now what the .Send and .Open is doing ( or as good as I need to: ..... – basically you are organising with .Open what and how something will be sent, like “doing the paperwork with .Open to order a T Shirt” , then Send is like making the “order” and in this case is the .Send is like not only sending the order, but also giving the Supplier’s number of the Picture you want put on the Ordered T Shirt, so he sends you the T-shirt you want)...

    The bit I am struggling on:_.............
    Please Login or Register  to view this content.
    _.. .............I have had a good look at this:
    view-source:http://www.ernaehrung.de/lebensmittel/
    ( Paste that in The Google URL Bar and you get the source code for the Main Home Site ( as if you did not know, Lol ... : ) ) )

    But i cannot fathom out yet how the red bits above put nothing in the first box and hit ENTER ( probably because it did not or maybe it did ... did it chuck me a DispHTMLElementCollection
    - no it did not - , the DispHTMLElementCollection comes out of the created DOM , like what it always does... with this
    Set linkItems = dom.getElementsByClassName("list-group-item")
    I think I follow that the red bits made it chuck back what i see for entering nothing as
    _..... But I do not see that ( pseudo ) “ DispHTMLElementCollection("list-group-item") “ in this
    view-source:http://www.ernaehrung.de/lebensmittel/

    _................but I do see ALMOST that in this
    view-source:http://www.ernaehrung.de/lebensmittel/suche/
    ( after about 10 mimutes as it takes my computer that long to do that .. Lol )
    _..................I see this:
    < div class="list-group" > ( pseudo “ DispHTMLElementCollection("list-group-") “

    ( But I do see paired “Childs” like this ( An Element Node Type and a Text Node type ) < a href="http://www.ernaehrung.de/lebensmittel/de/BOFRO1287/Cordon-bleu-vom-Schwein,-bofrost.php" class="list-group-item" > "Cordon bleu" vom Schwein, bofrost< /a >
    _.. but I am really confused that you do not do a ( “pseudo Like” )
    (Node bit < div class="list-group" > ----- < / div > ) .Child(0) .Anchor
    _.............................

    So I suppose two questions here
    _4b)
    What is going on between what you have
    "list-group-item"
    And what I see
    "list-group"
    ??
    _ - what is wrong with my “ “pseudo Like” “ suggestion


    _4a) The main question... what did the bits in red do

    _4c) Minor Question.. why can’t you Dim your Links ( or any of your dom.getElementsBy____ for that matter ) As DispHTMLElementCollection ( I tried - it don’t work )

    Thanks Kyle, , you have been very patient here, I think I am very close now:::-- and thanks to the help, in the next Posts I will answer my original Thread question :-)
    _.......................................
    Alan

    P.s
    BTW. The point of the recursion routine is,. as I understand it, that text in a Cell is looked at a bit more in detail. The code will help pick out things like a paragraph , a small table ( what I did in my demos over in the test forum http://www.excelforum.com/showthread...=2#post4440149 )
    etc.. etc.. ....”within a Cell in a main Table the different Text parts are separated by a tilde character. This character can be used later to parse the text string into the individual strings from each part…“… . my Tidley demo versions of the .HTML source Code have demonstrated that well. Sounds like a good idea. For me_....... _..........'•.¸(¯`'•.¸*¨¨*:•.•:*¨*OVERKILL*¨*:•.•:*¨¨*:• ¸.•'´¯)¸.•' Yeah!
    _.. I like my adaption of it, as I can see a bit clearer what is going on, and it did reveal that the DOM thing can hold some secrets that a visual inspection of the HTML code does not show, or may be it makes a Model of its choosing which is only approximately like a sort of tree structure that we would write. – my code picked up on the DOM splitting a Node into two different things – It took a table row, ( < tr > ---- < / tr > ) and read it as a [b]HTMLTableSection/b] followed by a HTMLTableRow.
    **** http://www.excelforum.com/showthread...=3#post4443309
    _.............. <div class="list-group"> ..............


    So, The next Two posts will answer the original thread question
    _................................
    Last edited by Doc.AElstein; 08-02-2016 at 02:32 AM.

  34. #34
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Answer is Dribble. Up me Arrobs ... '•.¸(¯`'•.¸*¨¨*:•.•:*¨*OVERKILL]*¨*:•.•:*¨¨*:• ¸

    Hi Alan,
    OK Here you go...This looks like a good couple of start points:
    Quote Originally Posted by Doc.AElstein View Post
    .....
    LLeith’s DispHTMLElementCollection is a collection of HTMLHeaderElements (The Items in this thing are tables wrapped in side HeaderElements.

    The code in each main Loop
    Rem 4 ) , dribbles down Siblings and Childs to get each table, a single HTML Table , puts that in a Object variable and applies the various HTML Table Properties to that variable, reusing that variable each time in the Loop. Part of the Sibling dribble checked for the Like correct ID ied Container so was a bit selective and did not catch in the previous web site the last unwanted table...........................
    Quote Originally Posted by Kyle123 View Post
    You can’t go any more explicitly since none of the elements have IDs like they did in Leith's code.
    An ID is unique per page, classes aren't.....
    _.................................



    The source code does have “ID bits” but not near the table areas ..
    I like to dribble, and so does the Wife
    This DOM thing takes a code Text Like < HTML > stylio and somehow organises it into a tree like structure of Objects... apparently

    I guess you could think of it as some sort of organisation of how things should be done or turn out, official Properties, Method, Proof of existence , official Blue print etc.. etc.. . I guess applied to VBA it could be thought of something organising something else into a sort of pseudo document Object Model. ( Call it DOM for short )
    I guess VBA would have a Library to convert HTML code into such a Model – It seems to have a Library to do almost anything except maybe something disgusting I just thought about doing

    So maybe it can be safe to say I will be dribbling through objects with the aim being to end up at one a Table ( that is to say a HTMLTable Object ) , being as selective / explicit as possible.

    So how is this then:
    My Final aim here is to end up at my table Object (HTMLTable ). and i like to dribble..
    So
    Dribble through the old Leith Code ( after “requesting” and getting” a copy of the old source code I have as a .text like and .HTML file and chucking it “ body.innerHTML = PageSrc - stylio “ ( or .write PageSrc – stylio in Leith’s case ) in a DOM ) ....

    I will look at each step at the Object Type Name
    http://www.excelforum.com/excel-prog...-shown-in.html
    Then do the same for the new site, and see if I can get a good dribble “down” to the HTMLTable

    How to do that :_...
    (A)_...Assume I have a code that done a DOM
    (B)_...Make a Global Array of Objects, dynamic ( not yet and never fixed size )
    Dim myGlobyArrobs() as Object ‘ Dim will do, .. http://www.mrexcel.com/forum/excel-q...riables-2.html http://www.excelforum.com/tips-and-t...vs-public.html
    (C)_...Make a routine
    Sub UpmyArrobs(ByVal Obj As Object) 'Dribble down Siblings and Child Nodes etc. Hit Ctrl+G to get a copyable list of all (HTML) objects sent in a call
    to bring out my Arrobs, for example that resizes the Array, adding a string of the Object type and DeBug.Prints the list so far to the Immediate Window. ( That Print out in the Immediate Window is both convenient to look at or make a ( Ctrl+C ) copy. )

    In every Code that I will Call the routine, Sub UpmyArrobs( , from , - I will do an initial ReDim ( without Preserve ) to size (1) and put some identifying Header string into it. This is useful to keep the routine simple in not having to worry about if the Array() is empty, that can only be checked with Error handling which is bad
    http://www.excelforum.com/excel-new-...ml#post4164174 ....


    ( ....D) _.. Just for fun I will make two Arrays, one for the Object and one for the String of the Type Name of the Object, but I only really need the String.....)

    So at the start a couple of new Globies
    Dim myGlobyArrobs() As Object ' For Dribbles. Dim will do, .. http://www.mrexcel.com/forum/excel-q...riables-2.html http://www.excelforum.com/tips-and-t...vs-public.html
    Dim myGlobyTypName() As String ' For Dribbles


    ( At the “doing a DOM bit” of my code I will do the first Array Entry initialising bit ,
    '2d) Dribbles
    ReDim myGlobyArrobs(1 To 1): Set myGlobyArrobs(1) = HTMLdoc 'Probably will not need this, as I only want the String Name...
    ReDim myGlobyTypName(1 To 1): Let myGlobyTypName(1) = TypeName(HTMLdoc) '... Give some arbritrary value to the first String Entry. Type of main document will do

    Those last two lines were just for convenience and to simplify the ...... making of...... )

    A sub routine to paste out me dribbles in the Immediate Window
    Please Login or Register  to view this content.



    Then add dribble lines ( UpmyArrobs ) in the original code around the bit of interest.
    Please Login or Register  to view this content.
    Run the Code, and compare the results with the HTML code
    DribbleBitOld.jpg
    Attachment 473017





    _ .... that was painless !


    _ Next Post try again with the new Website
    Last edited by Doc.AElstein; 08-01-2016 at 10:17 AM.

  35. #35
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Answer the Thread original Question Dribbles 2 ... Thankyou All :) :)

    Answer the Thread original Question Dribbles 2


    Change the code Line from getting my copy of the original Old Web Site Source Code
    Let absURI = ThisWorkbook.Path & "\ShortenedseitenQuelText_HTML.html"
    To an example of a Web site Page in the new Web site
    Let absURL = http://www.ernaehrung.de/lebensmitte...eischkaese.php

    Change me header thing h2 to [COLOR="#FF0000"]h3[/COLOR, as discussed:
    http://www.excelforum.com/excel-prog...ml#post4438508
    Bottom of this post also http://www.excelforum.com/excel-prog...ml#post4438726

    In other words change the first “Disp” thing get.....”..... getElements by a big "DispHTMLElementCollection" thing. .....”.....

    So change:
    Dim Headers As Object: Set Headers = HTMLdoc.getElementsByTagName("h2")
    To
    _......_____....... HTMLdoc.getElementsByTagName("h3")

    Dribble a bit ( remembering one thing....
    Quote Originally Posted by Kyle123 View Post
    .... accessing the parent node .....
    recursion is OverKill
    Mostly Motorhead OverKill was the best bit there, but also .Parent gave me an idea whilst dribbling,
    and then to Cough up a .className instead of a .ID



    DribblesNewBingo.jpg
    Attachment 473018



    Bingo!



    _......................


    Here you go:::...

    Question : Help Modifying Scrape Code ( Getting Nutrition Tables ) after Web Site Change.
    Old Web Site towards the top of a Table is
    DribbleBitOld.jpg
    Attachment 473020


    Answer :
    Change the last two code lines here ( ignore the last ‘comment line ) :
    Please Login or Register  to view this content.
    _.............................

    To the last 2 lines here ( ignore the last ‘comment line ) :
    Please Login or Register  to view this content.




    I will leave the Thread unanswered a while, in case any more great help, follow up answers etc come, as that is what got me this far.
    I not only have the solution, but also
    Have extra back up solutions
    And most valuable:
    Thanks to the help I pretty well know what is going on and can probably sort it out myself for the next similar problem / Question...

    Thanks everyone again for the great help and Support
    Alan

    I want your babies
    Last edited by Doc.AElstein; 07-31-2016 at 08:59 AM.

  36. #36
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Having an attempt at answering my last questions

    Hi ,
    Just trying to answer a few of my last questions here, to wrap the Thread up...

    Let me give you my take on the outstanding questions / remarks from Posts #32 and #33
    http://www.excelforum.com/showthread...t=#post4445231
    http://www.excelforum.com/showthread...t=#post4445232

    Post # 32 Remark 1)Table Type Code
    Yeah I think that is cleared up. I can call Pike’s code a more “Direct table” code , I think – my Elarberations in post # 32 http://www.excelforum.com/showthread...t=#post4445231 clear up nicely what I mean / was meant

    _....................................................

    Post # 33
    http://www.excelforum.com/showthread...t=#post4445232

    Questions:
    _2a)
    The Dim request As Object _.... this is obviously not needed, that is one reason the With / End With is often done, - to reduce need to “Dim” variables..

    _2a) (i) (ii)
    I doubt if anyone is sure on that, but does no harm to get in the habit of
    Set request = Nothing

    _3) Correct: EF Management cannot speak English. ( And Microsoft XML Libraries do stuff with XML / HTML things, - ) The "msxml2.xmlhttp" as its name suggest “does stuff fo http / HTML XML etc.. I am using it in the scrapping codes only very simply – it just organises how to get the long text string of the HTML Code – you showed that by
    _a) replacing the URL in .Open with a simple .txt or .HTML file,
    Or even
    _b) doing away with all the Microsoft XML Library / "msxml2.xmlhttp" when replacing your Page Source String variable with a text string of the HTML Code copied through the clipboard. ---
    http://www.excelforum.com/showthread...5#ppost4445825


    _...

    4) The difficult one – still doing Brain Surgery while reading instructions in Urdu or Chinese!!
    OK I have the .Open bit sussed and that “POST” option I also understood from Kyle, ..
    So you get the correct source document / or text thereof that you want with the .responseText
    The point is it was not from this,
    http://www.ernaehrung.de/lebensmittel/suche/
    The above is the Home Page. What you got back was that massive page with all the URL’s in, which is exactly what you wanted_...

    So the question is: How did that happen??_...
    _...the answer lies in that in conjunction with the red bits here:
    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    .send "nameInput=&origin=egal&language=de"


    So take a look at the new Home Page in Google Chrome source ( http://www.excelforum.com/showthread...t=#post4445825 )
    http://www.ernaehrung.de/lebensmittel/suche/
    or in this file ( “newdebinetHome.html” ) In Notepad ++
    https://app.box.com/s/u8a5v6wdpv1u5umvk2k05bsi553ipx4u

    _.....
    “Content-Type” seems to be the main Header ??
    This
    “application/x-www-form-urlencoded”
    is not in the Code, but googling says something about miming a big long String. I have no idea what that means, but sounds like just “something you always do and only God and a few computer experts know why....”.

    So maybe you just always do that” ??

    I think I can suss this...
    "nameInput=&origin=egal&language=de"

    I am .sending with me POSTed order

    nameInput= __......_ ( text value of no string value text ( looking for all text matches of null string which is the trick to get everything ) )
    &
    origin=egal _......_( option egal )
    &
    language=de _....._(option de )

    _....
    So that last stuff gets the return of this big long web site, which I copied to a .HTML File like I explained here.. http://www.excelforum.com/showthread...t=#post4445825 .....
    Here is the File: ( “debinetURLs.HTML” )
    https://app.box.com/s/ja2666ti4l6qxz24smu00bun3yzslcky

    _... OK so I get almost to understand how i get the correct Web site ( HTML text thereof ) back

    Now how does
    ________ByClassName("list-group-item") get me the massive URL list collection...

    This for example is the first few of the URL's in that "Got" DispHTMLElementCollection thing
    HTML Code: 
    But the class="list-group-item" seems at the end when I expect it at the start ????

    Not sure what is going on there ???

    _...

    4c) Not sure why you can not Dim something as a DispHTMLElementCollection - you can Dim something as most of the other HTML DOM things? - Maybe just an oddity there?? ( like the one that some things are only available in Late or Early Binding ?? )

    _..............................................
    Still not quite there on this one.. but getting close.

    _.............................

    Most The rest of the code all makes sense to me
    Please Login or Register  to view this content.
    _......

    It is really just these bits I need help in Understanding in relation to the actual HTML source code

    The first bit ( red bit )
    Please Login or Register  to view this content.
    And from the rest of it the red bit also

  37. #37
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Sorry, you're just rambling

    From what I can gleam from the above:
    _2a) (i) (ii) I doubt if anyone is sure on that, but does no harm to get in the habit of
    Set request = Nothing
    This is a waste of time, not only that it probably does more harm than good as putting it everywhere not only adds unnecessary code, but also masks where it is actually necessary (it actually is in certain obscure scenarios). If you only write it when it's required then you know it's actually doing something. The CreateObject is irrelevant, if you want to do it for everything, you should do it for all objects, not just late bound ones - though it's probably better not doing it at all.

    I think you seem to have answered your other questions, if not, concisely (without your little grey side comments, fruity formatting or green comments) ask your questions and I will try and answer them, I find your posting style incredibly hard to read at the best of times but am happy to help if you try to condense it somewhat

  38. #38
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Kyle,
    Thanks for the reply
    Quote Originally Posted by Kyle123 View Post
    .....I find your posting style incredibly hard to read at the best of times but am happy to help if you try to condense it somewhat
    Sorry about that. I just thought it might help to suggest an answer.

    Tell you what, I will split my remaining questions, try and be as “monochrome and short” as much as i can, then I will only ask further if and when you have time to answer a question. – I have already learned loads from you here.
    Thanks once again
    Alan
    _................................................
    So

    Question 1)
    I confused everything a bit with a “Pike’s Code is a table Code” Comment.

    So I am just asking if how I explained myself makes sense ( is correct ) here
    http://www.excelforum.com/excel-prog...ml#post4445231
    And here:
    http://www.excelforum.com/excel-prog...ml#post4440737


    _......
    Basically Pike’s collects ( all ) Tables. Yours and Leith’s have them wrapped in things and your codes are a bit more selective ( Your collections are not just tables )

    That make sense ?
    ...” All a question of what your "DispHTMLElementCollection" thing is “.... ( Pike's Elements are HTMLTables , - yours are not so you can do a bit of selecting to get the tables i want and a header, etc.
    Thanks
    Last edited by Doc.AElstein; 08-02-2016 at 06:53 AM.

  39. #39
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    You really can't help it can you, what's with the rows of full stops, and grey text grrrrr
    So I am just asking if how I explained myself makes sense ( is correct ) here
    http://www.excelforum.com/excel-prog...ml#post4445231
    Yes.
    Yes.


  40. #40
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Kyle
    _..Sorry about that row of dots – That was an error – I changed the ... to _... some time ago to help your constipation ( you told me you had difficulty passing them:
    http://www.excelforum.com/the-water-...ml#post4192657

    _.. Yes I have a problem, -my last death throws against the chity chatty Twity short B/W message Smart Phone mentality. I think it will do us all in if we are not careful.
    ( But seriously I will do all post to you B/W if you prefer.. the colours help me to navigate a bit, but I am happy to refrain if it helps ! : ) )
    _............................................................



    Question 2)
    The Set request = Nothing stuff.

    You cleared that up really well. Just the very minor point which threw me off slightly initially.

    Your first ( Problem 1 ) Code:...
    http://www.excelforum.com/excel-prog...ml#post4439005
    _...
    It had a
    Dim request As Object

    That line is not needed when “you do it” by a With End With bit

    Correct?

  41. #41
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Yes, should have deleted it

  42. #42
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    Yes, should have deleted it
    Thanks_......


    Question 3) MSXML2.XMLHTTP / Microsoft XML, v6.0
    _a) Do you agree that EF Management appear to have a problem communicating to us in English ( or any language for that matter )

    _b) Is it possible in say a line of length approximately not longer than a long ‘Green Comment Line what the Microsoft XML, v6.0 ( and the similar earlier versions - they all seem to work as well ) and/ or MSXML2.XMLHTTP is, _...it’s main distinguishing characteristics over other Libraries

    _c) Is this approximately correct: The .responseText does nothing more than putting all I see in a HTML code ( Looking in a Notepad, Notepad ++, or “Google browser source code show thing”) into a simple single long string. In other words it does “pseudo”:

    Page Souce, As String, is = “Code Line1” & vbCr & “Code Line2” & vbCr & “Code...... etc.. etc...

    ( __.._ ...... d) does anything obvious spring to mind in my scrapping where this

    dom.body.innerHTML = .responseText

    might change to

    dom.body.innerHTML = Something else
    or
    dom.body.__Something Else__= .responseText
    or
    dom.body.__Something Else__ = __A Differsnt Something Else__
    _....)
    Thanks

    ---o00o---`(_)`---o00o---

  43. #43
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    3.b MSXML is a library for creating, reading and manipulating XML files. The bit we are interested in is the XMLHTTP which can make web requests, it could be swapped out with CreateObject("WinHTTP.WinHTTPRequest.5.1") which is a far more capable library and preferable for web scraping 99 times out of 100, I don't tend to use that object in forums since some corporate networks seem to have issues with the way it works.

    3.c Yes, it simply returns the HTML from the request as a string.
    dom.body.innerHTML = Something else
    or
    dom.body.__Something Else__= .responseText
    or
    dom.body.__Something Else__ = __A Differsnt Something Else__
    No, you'd only use the code as posted

  44. #44
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    This is great help, Kyle, thanks a lot

    _.......

    This is the big one
    Question 4)
    But, Maybe do the simple Part first:

    _4c) why can’t you Dim your Links ( or any of your dom.getElementsBy____ for that matter ) As DispHTMLElementCollection ( I tried - it don’t work )

    ( maybe a Disp is not a HTML thing.. oops --- Sorry , thinking aloud again ! )


    Thanks

  45. #45
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Just dim them as interface they implement:
    Please Login or Register  to view this content.

  46. #46
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    Just dim them as interface they implement:
    Please Login or Register  to view this content.
    Ah, is that telling me it is not a HTML thing or just a "quirk" that
    TypeName(divs) gives DispHTMLElementCollection
    but somehow it is a IHTMLElementCollection
    or is it more subtle ?

  47. #47
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    getElementsByClassName returns a NodeList which is an interface to the DispHTMLElementCollection which is an internal class that you're not supposed to see/use. It does implement the IHTMLElementCollection though so you can use that

  48. #48
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Kyle
    Quote Originally Posted by Kyle123 View Post
    getElementsByClassName returns a NodeList which is an interface to the DispHTMLElementCollection which is an internal class that you're not supposed to see/use. It does implement the IHTMLElementCollection though so you can use that
    I can just about follow that.. at least it satisfies my lust for a concise 'Green Comment that in 10 years i will probably undersatnd a little bit better.
    I get the feeling there is a lot of wide knowledge stuck behind getting that answer... ( you would not happen to know a bit about "computer stuff" would you.. Lol ..

    _..................................................

    OK almost there.... Penultimate question ( and the last one I may be getting almost there myself just now )


    Questions 4) continued_.....
    Here it comes part one of last two Questions.

    Thanks to you, I have got the point about most of this stuff:
    .Open ( also with POST )
    And
    .send ( also I think I almost have it with “Extra stuff here “ )

    BUT: this

    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"

    ??????????
    Googling is telling me stuff that is totally incomprehensible to me.
    Again a single long ‘Comment Line explanation would be great.
    May be, if possible, with emphasis on if / how this might change if I try to do similar Scrapping ( mostly of tables or lists that come up after I manually enter stuff )

  49. #49
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    When you POST data to a server, you need to tell it what format you are sending it in.

    So the Type of Content sent in the body of the request (the send bit) is application/x-www-form-urlencoded

    the url encoded data is: "nameInput=&origin=egal&language=de" which is the data from the html form, url encoded.

    Quote Originally Posted by https://gist.github.com/joyrexus/524c7e811e4abf9afe56
    For application/x-www-form-urlencoded, the body of the HTTP message sent to the server is essentially one giant query string -- name/value pairs are separated by the ampersand (&), and names are separated from values by the equal symbal (=). An example of this would be:

    MyVariableOne=ValueOne&MyVariableTwo=ValueTwo

    According to the specification:

    non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character

  50. #50
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Kyle,
    Is it right you can include log-in details with the Get line using XMLHTTP object?
    I normally use IE, if the site requires a username and password, but I have seen people use XMLHTTP as well.

  51. #51
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    When you POST data to a server, you need to tell it what format you are sending it in.

    So the Type of Content sent in the body of the request (the send bit) is application/x-www-form-urlencoded

    the url encoded data is: "nameInput=&origin=egal&language=de" which is the data from the html form, url encoded.
    Great i follow that ,

    where does the content-type fit in to all that... I see it here:
    contenttype.jpg
    contenttype.JPG

  52. #52
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    It just tells the browser what's being sent back from the request, but it's usually set by the server response headers (which override meta tags) so it's usually rather useless in the HTML

  53. #53
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by AB33 View Post
    Hi Kyle,
    Is it right you can include log-in details with the Get line using XMLHTTP object?
    I normally use IE, if the site requires a username and password, but I have seen people use XMLHTTP as well.
    Yes, if using "Basic Authentication" (most sites don't), but if you need to deal with authentication, you're usually much better using WinHTTP since sites usually issue a session cookie when fully authenticated. This cookie needs sending for all further requests, WinHTTP will do this for you automatically - MSXML requires you to parse the cookie from the response header and manually set it in the request headers for each subsequent call.

  54. #54
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    It just tells the browser what's being sent back from the request, but it's usually set by the server response headers (which override meta tags) so it's usually rather useless in the HTML
    Thanks think I get that, almost.....
    So I just allways have to look for the thing after http-equiv= , and put that in a code ?

    Or are You saying it will almost always be that anyway?
    If i change it a bit the code errors.
    Is it just some sort of section identifier ?

    This

    https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx

    says it is a header name, which I suppose makes sense , as it almost the first thing there – but all the HTML Codes I have seen so far have the same name ??
    – so maybe there is more to it than that ?

  55. #55
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    No, what's returned has nothing to do with the request. In this case, you ignore it.

    To get the correct headers and data for a web request, you look the network tab of the browser developer tools, not the HTML document.

    \1
    Last edited by Kyle123; 08-02-2016 at 11:37 AM.

  56. #56
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    No, what's returned has nothing to do with the request. In this case, you ignore it.....
    I think I follow that_..
    somehow with
    Please Login or Register  to view this content.
    you are organising how stuff is being given - your post # 49 I think I understood.

    I still cannot quite see how the first argument comes in – The second argument is talking about the Type of Content. I understand that.

    Is the content-type in my Image of the HTML code and the content-type in the code line not the same. ( Or the samt thing but applied to something else ? )

    Or putting it another way, the first argument in that code line
    Please Login or Register  to view this content.
    could be something other than content-type, in which case the second argument would be talking about something different ? ( And that something different probably does not come up very often, so i can ignore it ?)

  57. #57
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Think of it as property name, value.

    Content-Type is the property name, x-www-form-urlencoded is the value

    Your content type in the html is "text/html" not "x-www-form-urlencoded".

    So, yes, you can set different request header properties and pass different values. This isn't unusual, just not required in this case

  58. #58
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    Think of it as property name, value.
    Content-Type is the property name, x-www-form-urlencoded is the value
    Your content type in the html is "text/html" not "x-www-form-urlencoded".
    So, yes, you can set different request header properties and pass different values. This isn't unusual, just not required in this case
    Great so I think I have that one well understood...
    _

    Last Question ( I think!! )

    _.... the last one is bit tricky to explain: ( But in trying a last time I think it has twigged!!!! ( almost ) )

    This bit ( or rather the bit in red ) :
    Please Login or Register  to view this content.

    OK, I found this: ( just one example of the massive list )
    < a href="http://www.ernaehrung.de/lebensmittel/de/VOGEL15805/Peperonata-Paprikazubereitung-Vogeley-GV.php" class="list-group-item" >"Peperonata" Paprikazubereitung Vogeley GV< /a >

    I was expecting something like, pseudo, a child count of two ( or 0 and 1 lol ) , an Element Node and a Text Node

    < div class="list-group-item" stylio = something >
    < href="http://www.ernaehrung.de/lebensmittel/de/VOGEL15805/Peperonata-Paprikazubereitung-Vogeley-GV.php" > "Peperonata" Paprikazubereitung Vogeley GV< / href >
    < /div >

    So I suppose the question is now is

    _(i) Is my alternative also OK ?

    _(ii) can you explain the original ( actual ) HTML bit
    Thanks, I have a feeling I almost with it.. all !!
    Alan

  59. #59
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    It's not valid html, it's an a tag, not an href tag, but yes - theoretically there's nothing wrong with your suggestion, you could have a link inside a div. you don't need a separate "text node" though since an a tag is a text node.

    I don't understand what you're asking for in ii.

  60. #60
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    It's not valid html, it's an a tag, not an href tag, but yes - theoretically there's nothing wrong with your suggestion, you could have a link inside a div. you don't need a separate "text node" though since an a tag is a text node.
    I don't understand what you're asking for in ii.
    I was just asking if you could explain this, ( one of the many similar bits in the massive list.. )..

    < a href="http://www.ernaehrung.de/lebensmittel/de/VOGEL15805/Peperonata-Paprikazubereitung-Vogeley-GV.php" class="list-group-item" >"Peperonata" Paprikazubereitung Vogeley GV< /a >


    But, think I almost have it, I should have Googled the a tag _....


    All that was confusing me still now is ( was ) that I was expecting this

    < class="list-group-item" a href="http://www.ernaehrung.de/lebensmittel/de/VOGEL15805/Peperonata-Paprikazubereitung-Vogeley-GV.php >"Peperonata" Paprikazubereitung Vogeley GV< /a >

    So did a bit of brain surgery.. I changes the source code at that line to the above. I ran the code and in the output this URL was missing !

    Was this telling me that the syntax is the a bit first then the class=”” bit

    In all the codes i have seen up until now the class came first...

    Hang on ..... how about

    < a class="list-group-item" href="http://www.ernaehrung.de/lebensmittel/de/BOFRO1287/Cordon-bleu-vom-Schwein,-bofrostPoo.php" > "Cordon bleu" vom Schwein, Poo< /a >

    Hey it worked,

    OK Sorry I am rambling, but I think I have it
    I am not in a < div
    I am in a < a
    But where the class=”__” is not important.

    Here is the first few lines of the URL list in my modified source code:

    HTML Code: 
    I changed the order around in the first URL . But it still works. So the order does not matter. ...

    So I think I finally have it..

    What do you think ( I mean about if I “have it sussed” ) ?
    Alan

  61. #61
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Alan,
    Let me butt in before Kyle gives you more accurate answer.

    < div class="list-group" >
    is the parent of all the HTML tags under it, that is, all the anchors ("a")
    All these children are identified by the tag name of Div, but to separate this div from other divs on HTML document, we used class name as identifying container. This class attribute "list-group-item" gives us additional information on the anchor tag, that is "a". Remember the anchor tag is child of the div and this div parent has its own attribute, that is "list-group". Since there are may anchors within the HTML, we used the class attribute as its unique ID, that is, when you loop through all the anchor tags, you use ="list-group-item" as class name to identify this particular anchor.
    I hope this helps.
    Last edited by AB33; 08-02-2016 at 05:03 PM.

  62. #62
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by AB33 View Post
    I hope this helps.
    Thanks AB33
    That does help. I think at first glance it matches up with the conclusions that are slowly developing in my brain just now.
    So it is good to get a Summary, and independent brain coming in with that
    Alan
    ( the last bit what threw me off was the class "list-group-item" at the "wrong end" .. as i thought - but my experiments just showed that which “end” it was did not matter - it was just a coincidence that up until now it was at the "start" )

    _...
    Quote Originally Posted by Doc.AElstein View Post
    ........
    Here is the first few lines of the URL list in my modified source code:

    HTML Code: 
    I changed the order around in the first URL . But it still works. So the order does not matter. ....
    Last edited by Doc.AElstein; 08-02-2016 at 01:58 PM.

  63. #63
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Alan,
    My suggestion to you is, instead of cracking your head and going over minor details, you should spent some time on basic HTML and CSS reading. I was in your position a year ago and spent hours trying to understand what the heck is going on. You do not need to go in details, just skim them. HTML and CSS are hand and gloves on web development. Unless you want to go into web development, you can get the basics of HTML and CSS by watching a couple of videos on youTube.
    All these elements by name, ID,tag names, class names and attributes are derived from JavaScript or the other way round, that is, JavaScript nicked them from MS.

  64. #64
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by AB33 View Post
    Alan,
    My suggestion to you is, instead of cracking your head and going over minor details, you should .......
    As the only reason at all for doing any Computer stuff at all is to help speed up a personal project.. I did not want to go too far into another ( HTML CSS JavaScript and the such... ) area... but when I take a break from my VBA obsession soon, maybe what you suggest will be something to think about, - I collected a lot of HTML CSS and the such books a while back along with VBA VB etc.. etc... , before I went with VBA, - so maybe I will do a bit of bed time reading or fall asleep to some You Tube videos on the same... being able to scrap quickly Web sites confidently and quickly my data collection could be useful.
    Thanks
    Alan

  65. #65
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    I think so. The order of attributes doesn't matter, but the tag must immediately open the element.

    Just look up an intro to HTML if you're confused

  66. #66
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    I think so. The order of attributes doesn't matter, but the tag must immediately open the element.
    ..


    Hi Kyle,

    Thanks, you’ve been brilliant here. You and AB33 have managed to satisfy my perverse desire to understand everything in a code I use, - Quite rare for a code way outside my knowledge base, - I am not usually able to get that far in a code I don’t write myself fully from scratch.
    This thread is well solved, (_.. and the other::.. http://www.excelforum.com/showthread.php?t=1148621
    __....:

    Thanks again Marvin and AB 33 also ( and Leith and Pike who's code versions I am also still using )
    Alan






    P.s. another Excel Forum Software anaomaly..
    When I hit Reply With Quote in the last post.... I get Kyle’s quote and the one from AB 33 from post # 61 !! – Always fun here !! )
    TwoQoutes.JPG
    Last edited by Doc.AElstein; 08-02-2016 at 05:16 PM.

  67. #67
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    Glad you got it working

  68. #68
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Oh dear, Sorry, while tidying up ( simplifying all the codes and stuff learnt here ) I noticed a last thing I did not quite follow.
    Before asking this last main question, ( question 3) below, here is the related stuff I do understand... ( I will try to keep it as concise and to the point as I can

    _1) Kyle explained reasons why
    Set ___ = Nothing is mostly not a good idea. I get all that.

    _2a) I know what the whole ..... ( ‘2a) section in all my codes )
    _..... Request___ MSXML2.XMLHTTP___ .Open URL ___stuff does. And although once someone said to me it it is not similar to a code line like

    Open AFullPathAndFileName For Input As 1

    _2b)(i) I expect it probably is ? – my experiments showed that I could replace URL above with the Full File Path and Name of a File ( .txt or .HTML ) which contained a HTML code. So now it sounds very similar to me???? So short questions here.. is it similar?

    _2b)(ii) To follow on from the above does the
    .Send
    Effectively when it is finished do something similar to a code line like
    Close 1


    _...................

    _3) Typically codes using these bits
    _..... Request___ MSXML2.XMLHTTP___ .Open URL ___stuff
    _... have a complimentary section ( ‘2b) section in all my codes ) which does the “making of the DOM stuff”
    _... typically this complementary stuff looks like

    Dim HTMLdoc As HTMLDocument: Set HTMLdoc = New HTMLDocument 'Early binding
    Let HTMLdoc.body.innerHTML=Request.responseText
    Or
    Dim HTMLdoc As Object: Set HTMLdoc = CreateObject("htmlfile") 'Late Binding
    HTMLdoc.write Request.responseText
    ( In those code bits the .responseText comes from the MSXML2.XMLHTTP stuff: hope this code snippet clarifies not confuse, just some ways of implementing all the above:
    Please Login or Register  to view this content.
    _...
    So
    I have just noticed this sort of code line for some codes with the 'Late Binding case:

    dom.Close ___ ‘ ????????

    Googling is telling me things for that last line like “ close the output stream

    What is confusing me is that I can follow possibly that

    Open AFullPathAndFileName For Input As 1
    Is often followed later by a
    Close 1
    That makes some sense to me. I always explain it as ‘Open AFullPathAndFileName For Input as 1 sort of opening a data Highway, and complementary, Close 1 shuts it. That makes some sense and I have seen actual problems occasionally if I do not do that Close bit.

    But my experiments have shown that Request.responseText returns nothing more than a String variable of the entire HTML code. So I am doing nothing more than “chucking a String” at something which in the “DOM thingy case”, based on the contents makes a sort of Object Orientated Model of the HTML file so as to get .Method like and .Property like things from it ....

    So what is this telling me ???
    Is .write not just for DOM stuff, and therefore does all sorts of extra crazy system things ( that are not needed for the DOM making case ) . Some of this other crazy system stuff needs .Close ing..???
    I have tried adding
    HTMLdoc.Close
    In some other Early Binding codes and it seems to do no harm. I also was brave and took it out of the ‘Late Binding codes I found it in. No Ill effects yet.....

    Summary of Question:
    So I suppose I am asking really is
    Question 3) what is the
    .Close
    in context of the codes discussed in this Thread, and if and when it is necessary

    ( Is there a difference to like a Set dom = Nothing

    2b)(i) Is something like
    Please Login or Register  to view this content.
    Similar to like
    Please Login or Register  to view this content.
    2b)(ii) correspondingly is part of what
    Please Login or Register  to view this content.
    does ( finally maybe )
    Similar to like
    Please Login or Register  to view this content.

    Thanks
    Alan.


    Sorry if I am rambling, - just trying to suggest some answers and explain a bit more explicitly the questions

  69. #69
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    2b i) No, it's not, in fact I'm a bit surprised that it works

    Based on the fact that your above premise is wrong, your actual question makes very little sense in context.

    When you Send an HTTP Request, the request is closed once the response is received, this isn't optional, you can't choose to close it.

    "Closing" the dom makes no sense, you haven't opened anything

  70. #70
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: I think the Penny is slowly beginning to drop. That will relieve a few people maybe, o

    Wrong page - deleted

  71. #71
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Alan,
    Here is my two cents worth!
    I think the easiest approach to understand the DOM object is, like all objects in VBA, it has PROPERTIES AND METHODS and only few of them. You do not even about creating your own objects as excel has already done it for you. All you need to know what are these properties and methods
    Open is a method- it takes few parameters.
    Open(Get or Post,URL, False or True, UserName, Password)

    The open method opens a connection to the server. I do not think this server is the same as the server used in a database.
    Yes, you need to open and close a database. If you do not close, it will lead to all sorts of corrupting it.
    Another method is the send method. It actually sends the message to the server.
    Last edited by AB33; 08-05-2016 at 04:32 AM.

  72. #72
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Kyle,
    This site has something against your name. Unless I give rep to another person, I would not able to give you one.

  73. #73
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    2b i) No, it's not, in fact I'm a bit surprised that it works
    ......
    Hi Kyle
    Thanks for coming back here again, sorry to nerve I really thought I had it all.
    _.... ( by this
    Please Login or Register  to view this content.
    I was referring to this
    https://msdn.microsoft.com/en-us/lib.../gg264163.aspx
    _.. sorry, I see I missed the # out, should have been
    Please Login or Register  to view this content.
    Does then my comparison make any sense?



    Alan

  74. #74
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    No, it doesn't. One is for opening and reading/writing local files. The other is for sending http requests, they are not the same thing at all. It's pointless to try and draw a comparison.

    You really seem to be struggling with basic concepts, I suggest you look at the following in isolation (forget about VBA for the time being) - neither should really take that much time to process and neither are programming languages:

    HTTP
    HTML

    If you try and understand them (without complicating it with VBA, you'll have a much better understanding of what's going on)
    Last edited by Kyle123; 08-05-2016 at 05:31 AM.

  75. #75
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi AB33, thanks for the Reply,
    Quote Originally Posted by AB33 View Post
    .
    I think the easiest approach to understand the DOM object is, like all objects in VBA, it has PROPERTIES AND METHODS and only few of them. You do not even about creating your own objects as excel has already done it for you. All you need to know what are these properties and methods
    Open is a method- it takes few parameters.
    Open(Get or Post,URL, False or True, UserName, Password)....
    Thanks, I think that ties up with what I have learnt from this Thread
    _......

    Quote Originally Posted by AB33 View Post
    ...
    The open method opens a connection to the server. I do not think this server is the same as the server used in a database......
    Are you saying this
    Please Login or Register  to view this content.
    Has to do with “ ...the server used in a database....” which we all seem to be agreeing needs closing.
    Are you further saying the DOM .Open Method opens a connection to a different server that does not need closing, ( or as i suggested it may be closed automatically when the DOM .Send Method is done?

  76. #76
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    No this is all complete bunkum, I really thought you had it. You're just trying to make it more complicated than it is by drawing comparisons with unrelated things which is confusing you.

    Have a look at HTML and HTTP to the exclusion of VBA and see if it makes any more sense

  77. #77
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Just Quick Edit for AB 33 and me!!
    Hi AB 33
    Quote Originally Posted by Doc.AElstein View Post
    ..... DOM .Open Method opens a connection to a different server that does not need closing, ( or as i suggested it may be closed automatically when the DOM .Send Method is done?
    Quote Originally Posted by AB33 View Post
    .... the DOM object is, like all objects in VBA, it has PROPERTIES AND METHODS and only few of them. You do not even about creating your own objects as excel has already done it for you. All you need to know what are these properties and methods
    Open is a method- it takes few parameters.
    Open(Get or Post,URL, False or True, UserName, Password)
    ....
    I think we are both getting a bit mixed up here... The .Open and .Send are Methods I think of the Created Object "msxml2.xmlhttp" stuff, ( my ‘2a code sections) not the DOM which typically comes just after in the next bit typically ( my ‘2b code sections) , the Create Object "htmlfile" or New HTMLDocument stuff

    Wow my brain is hurting


    @ Kyle... I am thinking....

  78. #78
    Forum Expert
    Join Date
    03-28-2012
    Location
    TBA
    MS-Off Ver
    Office 365
    Posts
    12,454

    Re: Help Modifying Scrape Code after Web Site Change.

    Alan,
    May be you are right! We have got mixed up. The DOM has not thing to do with excel. To access this DOM from excel, we need an object with properties and methods. If you go through XML route ( As opposed to browse like IE), you need these open and send methods.
    Last edited by AB33; 08-05-2016 at 06:48 AM.

  79. #79
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by AB33 View Post
    Alan,
    May be you right! We have got mixed up. .....
    Thanks for the confirmation ! ... I think I may be getting the other stuff.....slowly

  80. #80
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Hi Kyle ( and AB 33 )
    I think I follow where i may have gone a bit wrong.., I was googling , for example, stuff like this
    http://www.w3schools.com/jsref/met_doc_close.asp
    This seems totally irrelevant to my code, which I think is what you are suggesting

    It is very difficult, I have no Computer background but like to understand as much as I can from a code I use. Usually I do. This was one case I did not. –And I got caught out.

    I understand the very basics of HTML code – from this Thread you see I played around modifying such codes. I think somehow HTTP is a standardised way of moving these things like HTML about. The browser is a thing that amongst other things reads ( based on the HTTP Protocols etc) and presents what we “see”

    I think I do / did have it, thanks in a great part to you .
    My msxml2.xmlhttp .Open as “...'sort of preparing the Path, telling who or whatever that a particular sort of requests coming, and or checking that that “path” exists is valid, will respond if asked etc.. etc.....” you said was a reasonable description.

    You also agreed that
    My msxml2.xmlhttp .Send as “...'actually then does “send” the “request”....” was OK as well.

    Somehow then I “have” an msxml2.xmlhttp object. But I proved that in my
    .Open "GET", URL
    I can replace URL with a Full File path and Name to a .HTML or .txt File . So I am doing then I think no HTTP stuff in that case. And it still works.
    “...MSXML is a library for creating, reading and manipulating XML files. The bit we are interested in is the XMLHTTP which can make web requests...” .. I am not in this case making a web request

    Anyway, From that msxml2.xmlhttp object , however I “fill it” , amongst other things I can get a simple text using
    .responseText Method.

    _......

    OK with the msxml2.xmlhttp sections of my code I make and fill an Object. “Data” of some description “moves somehow” , but for a reason that cannot be easily explained#### this is not like “data transfer” in this
    Please Login or Register  to view this content.
    _.......
    #### Maybe I can explain.... let me make a coffee, think a bit and post further.

  81. #81
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Help Modifying Scrape Code after Web Site Change.

    telling who or whatever that a particular sort of requests coming, and or checking that that “path” exists is valid, will respond if asked etc.. etc
    It prepares the request, it doesn't do any of this stuff. You won't find out if a path is valid until you send the request.

  82. #82
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    Quote Originally Posted by Kyle123 View Post
    It prepares the request, it doesn't do any of this stuff. You won't find out if a path is valid until you send the request.
    Thanks for the update, appreciate it helps to get it clear
    So regarding ta Created Object "msxml2.xmlhttp" stylio thing...

    .Open ' sort of preparing the request, ....
    .Open "XXX", URL, ____
    depending on XXX the Object will know a bit more about how to do it. It may for example need then additional suff like Request Header infomation

    then

    . Send "YYYYYY" ' 'actually then does “send” the “request”. Depanding on "XXX" there may or may not be some additional Infomation "YYYYY"

    Getting there, I think I already was, just got a bit knocked off wack ......
    _.....
    Quote Originally Posted by Kyle123 View Post
    .... this is all complete bunkum, .......You're just trying to make it more complicated than it is by drawing comparisons with unrelated things which is confusing you......
    _.. I almost agree withh that, at least the getting confused bit. Lol..... The stuff is related.. it is code stuff I come across in codes i am using. Just trying to get a bit of understanding of those bits

  83. #83
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    #### Maybe I can explain
    Clearly things were being mixed up ( by me mostly, - sorry )_....
    _....

    Please Login or Register  to view this content.
    In the above code snippet, this bit
    ' other stuff with File reading
    Might include , for example, reading a data line from a text file, one line after the other. So during this process, it sounds reasonable that some data path is enabled / opened / available etc. In the real world in the actual inner workings this may mean nothing more than things are set that would otherwise have to be redone every time each Line is read. It makes sense to turn these processes off ( Close somehow ) when you are finished as they may interfere with other computer things.

    Got that one out of the way, I think...
    _......................................................................

    The other stuff including the mysterious dom.Close ?????

    I Think we at the outset ( my '2a code sections ) are talkng "xml HTTP HTML" stuff as opposed to 'HTTP stuff. That is a subtle difference, but does allow me later in these explanations to talk about a subset of that, namely HTML stuff.....


    Quote Originally Posted by Kyle123 View Post
    ....
    When you (using the xml request) .or the browser makes the request, it retrieves exactly the same thing, a text (usually, but this is set by the mime type of the response) template of the web page.
    The browser parses the text, and builds the page (forget about JavaScript and CSS for now, they're not helpful in explaining the point). This built page is referred to as the document and one interacts with it using the Document Object Model.

    The HTMLDocument/File ........ does exactly the same thing, but is a lighter version of doing it in the browser (it just builds the page - without any of the other stuff, JavaScript execution, CSS, fetching other dependencies etc..) that's why it's so much faster than automating internet explorer. Think of it as a browser that you can't see.....
    Quote Originally Posted by AB33 View Post
    ....you go through XML route ( As opposed to browse like IE), you need these open and send methods.
    OK....
    The MSXML(xmlhttp )

    .Open Method in the simple
    .Open "GET" version is in a “mode” / “condition” whatever where the HTTP in xmlhttp is not needed, no protocol, like RequestHeader .. If you like it is just going to expect something it can get a HTML code from. In this form it can tell the difference ( if there is one ) between what the .send gets from a Website or a .HTML or a .txt File. Somehow it knows how to pick out the Code.
    Considering I can get it to work with a simple .txt File, I cannot imagine it gets more than the simple HTML code String. So maybe in such a case it does not have a lot more to offer than the .responseText. ( I note in passing here: As mentioned I can replace my URL with a Full File Path and Name to a text File. I find however, it will not take a String variable of the Code, ( It gives this out out using PageSrc = __.responseText ) . Clearly the MSXML(xml___ ) thing likes to dig it out of somewhere ( which it does with what is .sended).

    In any case if any “data” moving is done ( If .send is successful ) then that happens in one go. ( If any paths are opened and closed ( only God and maybe Kyle and a few others knows ) then that is done pretty well instantly. Actually Kyle said that
    Quote Originally Posted by Kyle123 View Post
    ....
    When you Send an HTTP Request, the request is closed once the response is received, this isn't optional, you can't choose to close it.
    "Closing" the MSXML(xmlhttp ) makes no sense, you haven't opened anything
    Maybe I might be so bold, or please correct me if I am wrong, maybe in this case I could say I Send an xmlHTTP Request, which in this particular case maybe I could say is a xml Request or a xmlHTML Request. If there happens to be any HTTP ( protokol etc ) stuff it is clearly not needed when I replce URL with my Full Path and File Name of a .HTML or .txt File

    _.................

    As far as the DOM is concerned ( my '2b code sections ). -- As well explained to me by everyone including myself, the DOM is just a convenient thing to use, - it makes a Object Orientated type, tree structure organised like, Model of the HTML Code. As discussed in this Thread you create this through , pseudo code like:
    dom.body.innerHTML = PageSource As String of the whole HTML code
    or
    dom.write PageSource As String of the whole HTML code

    Clearly there is no opening or closing going on of anything here... . Somehow a simple text String is passed to the DOM which is enough Information for making the DOM (We can use either the innerHTML property, or the .Write Method – but in laymen’s terms we “chuck a long String of the full HTML code at it” )

    At the end of the Day I can accept with these explanations and clarifications above that I need no

    dom.Close

    It really was the presence of this code line that through me off
    _....................................................
    So
    _................................

    @ Kyle, ( or anyone else )
    Questions:
    _A)
    How does this sound. The
    dom.Close
    was totally out of place, probably a mistake

    _B) Can you explain what that .Close Method of a DOM is, and where it might be used? ( If there is one - I cannot find it anywhere, only the close() stuff, which is totally something else , I think, like HTML Code )
    Thanks so much for keeping with me , I have a feeling I must be pretty close if not finally as far as I need here..
    Alan

    _.........

    ( P.s Just a note for fun. Since started playing with all this .. almost everything I open is tried to open with Internet Explorer.. Lol.. I must have chucked a Spanner in the works or forgot to turn off or Close something.. Lol.. But that don’t bother me . – that will probably sort itself.. )

  84. #84
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    .Open .Write .Close ... ;-) ....Adding Explicit Pedanticisms to .Open the Mind

    Solution to Initial Thread Problem 1) incl. Explicit Pedanticisms

    Hi
    Just rapping the Thread up, following some last supporting Input from about here:_...
    http://www.excelforum.com/showthread...=5#post4449898
    _... which covered some of the remaining issues from the last post above.
    Important characteristics of this code version are various Explicit Pedantry’s. This makes this code a very useful basis for use in anticipation of further code development to other slightly different requirements. But noting good advice_......
    http://www.excelforum.com/excel-prog...ml#post4445900
    -..... this will only go some way, and an all encompassing flexible code is unlikely in the practice.

    This summary goes right back to Post 1, and answers the main Problem 1) , which is about the only thing not covered explicitly in the Thread Lol..
    Post 1, ( Problem 1) _..
    http://www.excelforum.com/excel-prog...te-change.html
    _....( which I actually mostly solved along the way in the “Appendix” Thread http://www.excelforum.com/showthread.php?t=1148621 ) and mentioned somewhere above
    http://www.excelforum.com/excel-prog...ml#post4445234 _.
    .)_..
    _...................................

    _.... Just now I am wrapping up and doing a fairly simplified version to actually use in the practice, so maybe sharing that as I do it will make a nice final Post ( Posts ) for the Thread .

    _ So referring back to that Post #1 we had a code that got me a table of Nutrition values from a Web site.

    Now I will re-write the original code based on all I have got from the Threads. ( This re - write was necerssary due to the Web Site changing )
    I will try to cut down a bit on the explanations and ‘Comments this time for clarity. In this thread and in the Appendix_....
    http://www.excelforum.com/showthread.php?t=1148621
    _.........all the codes and endless variations are included there and in uploaded Files and the associated issues have been done to death!!



    _............................................


    Code explanation of this new Code:___....
    Sub ScrapedebiNetLeithAlanAug2016ExplicitPedantry()
    which is given here:_.......
    http://www.excelforum.com/showthread...17#post4449914
    http://www.excelforum.com/showthread...17#post4449917
    _...:::::::..... is in next Post:

  85. #85
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    .Open .Write .Close ... ;-) ....Adding Explicit Pedanticisms to .Open the Mind

    _.............................Continuing the concluding Posts to Thread started in last Post

    Code explanation of this Code:
    http://www.excelforum.com/showthread...17#post4449914
    http://www.excelforum.com/showthread...17#post4449917


    Rem 1)
    This is less relevant here. In my actual application this goes off selecting the particular web site Page of Nutrition Values for a food product from a URL List generated, currently using the Problem 2) code Kyle did for me. Here that part is commented out and a further line just gives a path to such a Web site Page, to compare better with the code given to me by Klye

    Rem 2)
    '2a xmlHTTP stuff

    An Object is used, MSXML2.XMLHTTP, which is part of the MSXML Library. This Library is used for creating, reading and manipulating XML HTML Files and the such, including the manipulating of them through Internet. In this Code it does not do a lot more than return us a long String of a full HTML code of a Website. It is like browsing blind, that is to say not feeding it to the browser to do all sorts of stuff giving you the “seen” screen version. ( In the fairly simple use this Object where it "GET"s us that String,. Note: you can just as well give it the Full Path and File Name of a HTML file, .HTML or .txt . Often this is referred to as a xml request, which I guess is what it is doing – asking for " < pointed bracket paired Elements type stuff > ")
    The final result from this section is a simple String Variable , PageSrc As String
    This section is finished. We no longer need the Library. Optionally we can therefore Set request = [color=blue]Nothing[/blue], a step most appropriate if required for some reason. Previous arguments of good practice to prevent memory leaks and data corruption appear outdated in favour of only using when a good reason is apparent to avoid masking when it is a good idea.
    _.............
    '_.. '_..EP2ab Explicit Pedantry. ‘_.. This line, splitting the two sections, is a very important Explicit Pedantry, EP:
    http://www.excelforum.com/showthread...t=#post4452110
    ’ We intend using PagrSrc through a method to produce a model Object Orientated stylio for later use through use of its Methods and Properties. This model is frequently referred to as a Document Orientated Model, DOM. Some steps in this creation of the “DOM” can frequently be confused with the processes in ‘2a which are in fact now finished. Part of the .Send in ‘2a , “finishes” all processes. We move on to ‘2b. Only PagrSrc is required to be “taken over” as it were. Hence an Explicit Pedantry is appropriate here to emphasise the separation,
    _............
    '2b DOM stuff' Make OOP type model of HTML code
    Instead of “feeding” the code to a browser, we use an Object from the MS HTML Object Library going by the name of htmlfile or HTMLDocument. We “feed” our HTML code into that. What this Object does is attempts to make a sort of Object Orientated Model of the Code. It becomes a Tree like structure. The majority of the things at a Node or branch point of the made model will be the paired pointy bracket paired stuff. But there will be other “Nodes” such as the Text nodes of interest to us.
    If you were to take the HTML code and carefully indenting at each of the levels of code tag pairs( code tag pairs within code tag pairs ( called "Child" , children or similar ) ) levels then you would get something close to but not necessarily exactly this Model.
    ( In the accompanying Appendix Thread http://www.excelforum.com/showthread.php?t=1148621 there are lots of diagrams of this code versions that show this sort of thing up , for example:
    http://www.excelforum.com/showthread...=3#post4443252 )
    In particular here, the red stuff, showed an interesting difference, with the DOM organising itself a bit different to how the code looked.
    http://www.excelforum.com/showthread...1148621&page=3
    http://www.excelforum.com/showthread...=3#post4443252

    There are some important aspects of this “filling” of the [b]DOM[/b to consider.
    Using this external Library, Microsoft HTML Office Library, within VBA for our and most applications a simple assignment of .body.innerHTML = PageSrc is satisfactory. An alternative .write method being “fed” PageSrc in VBA does exactly the same in a more complicated way . When used in other language the .write, syntaxly written ,Write(), may be capable of doing more. We use therefore for the good of extending our full encompassment of life this Explicit Pedantry.
    Further, in usage outside VBA, Methods for an instance will often be required which require a clearing of an instance before “using”. Approximately in VBA this can be considered putting the DOM back to as if it were at the point just before it is given “loaded” with the PageSrc String. Effectively in VBA doing a pair of
    Set = Nothing , with either a Dim and Create Dom or Set = New type code line
    It serves no purpose usually in VBA. Effectively we reset a situation back to as it is. It can however be used / "accessed" through .Open, that is to say
    HTMLdoc.Open
    This can be sometimes be confused with the .Open from ‘2a).
    When used outside VBA, some processes started by .Open() can , and / or should be finished after the corresponding .write(). This is done using .Close(). Once again this can be used in VBA through HTMLdoc.Close. It has no conceivable merit or known as yet reason to use it in VBA. I include it therefore as an ultimate mind broadening Explicit Pedantic. I am quite proud of that. I once counted things. There was no apparent reason that I could think of. I thought someone should do it. As I could, I did. Some more details to the development of the Explicit Pedantries, EP’s, can be revied here:
    http://www.excelforum.com/showthread...t=#post4452110

    Rem 3) based on the comparing of the different codes here, for example: _....
    http://www.excelforum.com/excel-prog...ml#post4445231
    http://www.excelforum.com/excel-prog...ml#post4440737
    _....A general technique in such code where a number of Tables are required, ( if there is some similarity, and importantly, some similar HTML code sections in the region of each table ) is to use one of the DOM getElementsBy____ Methods which returns us a large Object which is a collection of Objects ( Generally these smaller Objects will be similar in at least some ways ). In general we would choose this Main large collection Object such that each, or most of, the Objects within the main large Object either is, or contains one of the tables of interest.
    More specifically the returned large Object is a NodeList ( as can be seen in the Watch Window) . This is an internal class ( In the HTML code ).
    http://www.excelforum.com/excel-prog...ml#Post4446628
    We are interested at accessing the Items within the large Object. The large Object is Dimensioned as the Interface it implements, IHTMLElementCollection, which has an (item) method to get at each Item.
    https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx
    Unusually this Large main Object is Dim ed as an Object, ..as you find you cannot Dim it as what its TypeName( ) returns ( or as displayed in the Watch Window ), “DispHTMLElementCollection“ . This is because no one thought to ask someone like Kyle why )
    These items will be from our DOM HTML Elements

    The ____ can be chosen to select various things. The Objects within then returned will be HTML Element Types of the ones selected by___ whatever.
    The original code got by the header tag h2

    Rem 4) Looping, for each table , making output
    For this code like, Kyle, Unlike Pikes, the collection Object is not a collection of the tables. This can allows some selection criteria whilst getting to the actual table of interest.
    So each of the Objects can be accessed, through the Header(ItemNumber) and to this returned DOM HTML Element the various Methods and the Properties of the DOM Model can be applied

    '4b)=== main Outer loop
    Using various Properties of the DOM Model, the original code checked the next along ( Sibling ) Element/ ( code Tag ) to have a name id= " ____" _....
    __If Headers(n).NextSibling.ID Like "container#" Then
    _.....and then if found, the first next one down of the first next none down was selected which based on observation of the HTML code gave us our Table of interst_.....
    _____ Set oTable = Headers(n).NextSibling.ChildNodes(0).ChildNodes(0)
    http://www.excelforum.com/showthread...=2#post4440149
    http://www.excelforum.com/showthread...=2#post4440257
    http://www.excelforum.com/showthread...=4#post4445214
    _....

    The new Site changed a bit
    http://www.excelforum.com/showthread...=4#post4445221
    I felt like keeping this code similar to as it was as i had a couple of nice alternatives from Pike and Kyle, so the first thing I came up with was again to get tag Name ( h3 this time ) then go a level back up , check for the className
    __ If Headers(n).ParentNode.className = "panel-heading" Then
    And then if found, go starting from there , then one along and down to the first one level down to get at the Table.
    ____ Set oTable = Headers(n).ParentNode.NextSibling.ChildNodes(0)

    So we have used the current Main Loop Bound variable Count (item number), n, to Obtain a table Object. ( The oTable variable will be reused for the next Loop )

    '4b(i) We have a DOM HTMLTable Object now, and can further apply various Methods and the Properties of the DOM Model to get at our required Table content

    Initially we can determine the table dimensions and then Loop appropriately. Then
    Briefly in words,
    '---Inner loop does at each row, ....
    _--each HTML TableRow Element is Looped through
    '--- .... 'go to next "Cell"
    _----each of its Cells ( what we “see” as columns ).
    There we
    '4b(ii)a Fill a element, r, c, of the Output Array
    At each of these cells the Cell Text will be added to an Output Array, which we were able to dimension based on the table Dimensions.
    Most codes apply the simple .innerText Property to each HTML Cell Element. This code uses a routine.

    This Inner Loop and the routine has been discussed more time than I can remember in this and the appendixes Thread..
    http://www.excelforum.com/excel-prog...ml#post4445231
    http://www.excelforum.com/showthread...=2#post4440149
    http://www.excelforum.com/showthread...=3#post4443309
    I have simplified the routine a bit for use in the practice and that will be one of the last codes I give. The ‘Comments on it are consolidated a bit.

    '4b(ii)b "
    Post processing last column of unified units.
    This bit is squeezed in at the end of looping our last determined Table column ( not necessarily the last Row.Cell ) , and just before going to the next Table Row. This section is not as complicated as it looks. It simply fills an extra dimension “column” (which was initially included in the output Array) with a Nutritional values in unified units.

    '4b(ii)c Output from Array for current loop table

    'Go to the next table====
    The above is then repeated with a new item, n in large collection Object (item) to get next Object within and start checking that one out.

    Thank you for all the help everyone..
    Alan

    Last edited by Doc.AElstein; 08-11-2016 at 12:51 PM.

  86. #86
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    This is just adding a quick additional solution to this Thread.
    Thanks to MarcL for the code
    http://www.excelforum.com/excel-prog...ml#post4453613
    and thanks AB33 for the head’s up.

    It appears to follow the “Direct table” code version of Pike’s.
    It differs however in that it uses some “frame” Clipboard technique to paste out each table in one go, rather than for each table looping through the rows and cells ( columns ) of that table. ( So it misses out the Inner loop for each table in all the previous codes considered here.)

    Here is just a very quick adaption to my Problem 1) codes, as a quick additional solution to the Thread.

    Code here:
    http://www.excelforum.com/showthread...66#post4454066


    Alan

  87. #87
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow

    Hi !

    Quote Originally Posted by Doc.AElstein
    I wish I could understand this:
    oDoc.frames.clipboardData.setData("Text", objTables.Item(T).outerHTML)
    it seems to "put stuff in the clipboard ?????
    Yes ‼ And like Excel can directly understand an HTML table,

    as you can see in older threads either using IE (Web query returned no data into Excel)

    or like through a request (Copy webpage hyperlinks) …

    And thanks for the rep' in the thread where you found my sample !
    Last edited by Marc L; 08-11-2016 at 07:55 PM.

  88. #88
    Forum Expert Doc.AElstein's Avatar
    Join Date
    05-23-2014
    Location
    '_- Germany >Outside Building things.... Mostly
    MS-Off Ver
    Office 2003 2007 2010 PC but Not mac. XP and Vista mostly, sometimes Win 7
    Posts
    3,618

    Re: Help Modifying Scrape Code after Web Site Change.

    @MarcL
    Thanks MarcL !

    I had a look, _....
    http://www.excelforum.com/showthread...54#post4454654
    _....it is making some sense to me

    Alan

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Web Scrape To Excel VBA: JAVA Site
    By excel2425 in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 07-18-2016, 05:26 PM
  2. [SOLVED] Please help in modifying this Code to change the extension from .xlxs
    By sai0449 in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 12-10-2015, 12:23 PM
  3. VBA Code to scrape screen web page
    By torrmel9701 in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 11-08-2014, 09:41 PM
  4. VB code to scrape a few things from a page
    By cfcMalky in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 11-06-2014, 04:31 AM
  5. Scrape Data Using VBA From IE to Excel - Help with code
    By Galleon in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 08-07-2014, 11:00 AM
  6. Screen Scrape Code
    By bpeyton in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 02-21-2013, 02:57 PM
  7. Modifying code to change number to word
    By vishwajeet_chakraort in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 04-07-2012, 04:32 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1