+ Reply to Thread
Results 1 to 23 of 23

Two Excel Sheet Comparison Scraping

  1. #1
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Two Excel Sheet Comparison Scraping

    Hi,

    This is a tough one - I have two excel sheets which can be seen in the attached file.

    Both sheets have items which are not the same but mean the same. You need to use scraping techniques to extract matches from both and then identify the cheapest price for it to be populated into "Final Sheet"

    Sheet 1: Has products and prices (gets updated on a weekly basis)
    Sheet 2: Has products and prices (gets updated on a weekly basis)
    Final Sheet: Will be populated with products and prices

    Methodology:

    Use "Sheet 1" and "Sheet 2" to exact type of products and whichever has the lowest price should be populated in "Final Sheet"

    Example:

    "Sheet 1"; item 165
    Apple iPhone 5 64GB Mobile Phones
    Price is $907.07

    "Sheet 2"; item 202
    Apple iPhone 5 A1429 GSM 64GB White
    Price is $917.0

    In the above example both items are exactly the same; but I require the lowest priced IPhone to be populated in "Final Sheet". Similarly there are many more items such as this.

    My requirement is that the data is to extracted and populated in Excel. The excel sheets do not have the same number of items, where one item exists and the other list does not have the item; then the item gets populated into the "Final Sheet"


    This is not a simple job where you look for exact matches and copy them in to the "Final Sheet" this is complicated as the description and the price is the key that will standardise them and the compare it.

    Omer
    Attached Files Attached Files

  2. #2
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Cool challenge.

    I am thinking about matching on brand and for matched rowsisolate individual words from description string and calculate the percentage of words matching between two descriptions. Once the match percentage is higher than x% the match is considered proposed and if the price difference is also < tolerance% than the rows are considered a match and the cheapest row will be copied to the result worksheet.

    How does that sound?
    If you like my contribution click the star icon!

  3. #3
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    OllieB,

    I have a fair idea on what you are referring to and as it seems there is human intervention involved in this (which is fine)

    Getting someone to do it free is a good idea but can someone do it for me for some $$$ - once off and then minor tweaking every now and then.

    Omer

  4. #4
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Omer,

    The solution I have started building would only require tweaking the matching and price difference parameters so that the result can be relied on. I expect this tweaking to be a fairly one off activity, just have to find the desired confidence level.

    Not sure what you mean with the statement about getting someone to do it for free.

    I will only be able to work on it a little bit longer today and then continue tomorrow. Are you in a hurry?

  5. #5
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    By this week.

    With the free bit I'm stating that no one will do it free but I'm ready to spend a little money on the solution as its been going for 2-3 months.

    Once its done and the returned values in a final sheet can be used via vlookup to see the items source - so when you place an order you would know which supplier to chase.

    Thanks a million!

    Omer

  6. #6
    Forum Contributor
    Join Date
    01-18-2013
    Location
    Prague, Czech rep.
    MS-Off Ver
    Excel 2003 - 2016
    Posts
    138

    Re: Two Excel Sheet Comparison Scraping

    Hi omer81,

    as OllieB suggested I´d also go with separating the whole name and than compare the parts of it, just, wouldnt go with % since that one gonna be highly influence with the # of parts you get, So I´d rather go with setting 2-3 columns as prior and if those get the match I´d mark it as success and than continue with the function to compare prices

    Best regads

    Soul

    PS -- you´re really going for a tough one

  7. #7
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    Hi SoulP,

    You're on the mark there - its a problem and problems need to be solved - I was trying initially but then got a bit frustrated as there are only so much I know about Excel.

    Omer

  8. #8
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Hi Omer,

    I will give it a go tomorrow (for free).

  9. #9
    Forum Contributor
    Join Date
    01-18-2013
    Location
    Prague, Czech rep.
    MS-Off Ver
    Excel 2003 - 2016
    Posts
    138

    Re: Two Excel Sheet Comparison Scraping

    Hm,

    have been looking in your file for a moment now and I can tell only that there are ways of doing this, just you (or rather the guy whos gonna write the actual code) needs to find the system in the data input from each source and than from that you can separe it.

    I´ve done something like that, handed it just today to my colleague it was tax/custom related and I had to spent few hours to actualy get the system at which the data were filled in the forms etc.

    Also a little hint, I´d go by the maker(producer) grouping (which allows you to separate the actualy name from the part of the string you already know will match on many occasions).

    The only problem to me (as for now) is that im not sure whether certain things match or not - I mean I m not into cameras and that stuff so i dunno whether to compare em according to mm values or whatever, like due to the pricing it seems like those two first canons should be about the same (match) but the name is completely different, in one case one is missing the mm value the other is off the rocket with the value, so that would be no match for the code in my eyes and this brings me back to the point mentioned above, if you sort the stuff by producer and mby category, u can set than the comparing parametrs (such as those MM of cameras) which seems like the way to go you create one big loop and you should be ok, just to find the exact paramters for each category

    Hope it helps, this little analysis.

    I´m sorry i wont try it by myself, I´m just not fit for it today (braindead), mby tomorrow will see

    Soul

  10. #10
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    Thanks both SoulP and OllieB,

    The maker / price is the key.

    For example in both sheets you will find Iphone 4s 64gb (or 32gb..or whatever) - similarly in the camera's everything is important upto the point where "Brand New" is written or the words "Digital Camera" / "Camera" the numbering convention is there to define the model and type of camera (in camera world speak)

    I'm really wanting to see the end result of this puzzle.

    Omer

  11. #11
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Hi Omer,

    I have a first version which you could have a look at. Please look at the worksheet Result which is generated from scratch when the code is executed. The Match column indicates the confidence level of the match expressed in a % and the arrows indicate which vendor is cheaper. The minimum match and maximum price differencd are set as constants in the code
    Attached Files Attached Files

  12. #12
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    OllieB,

    I'm like a little kid in a candy store - I'm absolutely going to love my weekend going through this finished excel sheet which has torn many weekends apart.

    I'm currently going through the sheet and normally check a 10% sample for personal confidence, and looking for ways to find a fault BUT cannot :D

    Most probably I'll be posting later this afternoon and hopefully I can get both OllieB and SoulP details.

    Thank you a lot - this is just amazing and indeed appreciate your work.

    Omer

  13. #13
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Hi Omer,

    Glad to hear you like the solution so far. I am sure that will be some snags that we will need to work out. I think you should increase the minimum match rate to 51% as I found a couple of false positives yesterday afternoon. I would also like to ask you to have a look at the item descriptions for vendor 1. There are a lot of duplicate item descriptions and I am missing colors etc. Any way of enhancing / extending those descriptions?

  14. #14
    Forum Contributor
    Join Date
    01-18-2013
    Location
    Prague, Czech rep.
    MS-Off Ver
    Excel 2003 - 2016
    Posts
    138

    Re: Two Excel Sheet Comparison Scraping

    Hi both,

    sorry i took my time, kinda busy lately. I have check on ollieb´s work and it seems promising so i believe you willl get what you´re looking for. I wanted to try it myself but sadly i got flooded with new requests for some tools at work so i kinda dont have that much time to look at this and I admit, i wouldnt be able to make such a thing ollieb did in one go, (would need like 2-3 days to sit on it to produce something i´m happy with), so I´m sorry if i disappointed you and you´ve waited to see my version of solution but there wont be any prolly.

    best regards both

    Soul

  15. #15
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    SoulP,

    Noooo don't give up on me!

    I'm now thinking along the lines of fuzzy logic that could also yield accurate results.
    OllieB had an amazing stab at it and as he mentions that there is substance in tolerance values of 51% and anything below that is questionable.

    I attempted to use the "LEFT" function, including spaces and about 20 characters on both vendors. This was an attempt to first standardize the information. If you look at the data; the manufacturer and model are normally the 1st two words of any line.

    Once done, I used the Match function on both columns to either give me TRUE or FALSE. Unfortunately this methodology did not work

    There isn't a way around the an items colors - then again anything that shows up as no match will need human intervention.

    And the work continues...

    Omer

  16. #16
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    Though I might share a new finding...I stumbled across PowerUps add in for Excel.

    There is a feature in there that finds similarities between two texts in terms of percentages; the problem there is that you need to standardize the text first.

    For example; IPhone 4s 32gb MUST be lined up to IPhones 4s 32gb White - to give you 80% or so...if its lined up to LG Washing machine - then the match is 0.08%.

    Well I can live with that as well, but how do I standardise the info in both sheets?

    Omer

  17. #17
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Hi Omer,

    I am slightly puzzled by your recent comments. It seems that your are not completely happy with the solution provided and are still exploring alternative approaches??

    Quite honestly I do not think that there is more room for improvement given the source data available. If for example vendor one is not able to provide the item color and vendor 2 does then that is something no tool is going to solve for you.

    If you are not satisfied with the result my solution gives you then can you give me example where a match is not made while it should be? Please also note that you can tweak the matching and price difference parameters and run the routine again and again to check the results.

  18. #18
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    OllieB,

    Sorry mate - don't agree with you

    To go from having NO solution to having THE solution overnight; its an absolute miracle and have no one but to thank you for it.

    What I had been saying previously is that there maybe other solutions but don't work as effectively, the example is that one that I attempted but the results didn't turn out too well.

    So in a nutshell the one that you've provided is the way forward.

    Could you let me know how is it done?

    Omer

  19. #19
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Omer,

    Ok, I must have misunderstood :-)

    The approach is actually quite. For each item listen on worksheet 1 we check for each individual word (separated by a space) in the description whether the word also occurs in the item description soled by vendor 2. We do this test for every item sold by vendor 2 of the same brand.

    The item of vendor 2 with the highest score is the winner provided that
    a) the match percentage > the minimum match percentage required (number of arguments matched / number of arguments * 100)
    b) the match percentage is higher than a previous match result for the item (an item of vendor 2 can only be linked to an item of vendor 1 once - the highest match prevails)
    c) the price difference between the item from vendor 1 and vendor 2 does not exceed the specified threshold

    As the items from vendor 2 for a given brand need to be analysed for every item of the same brand sold by vendor 1 I have decided to do some pre-processing during routine initialisation and store the results in a three level array structure of the following format: dictionary(brand).dictionary(item).Collection(description elements). Using this I only need to split the description into separate words once per item sold by vendor 2 as opposed to for each item of the same brand sold by vendor 1 (= better performance)

    The are certain words in the description of items by vendor 1 which do not exist in the description provided by vendor 2, these words would never be found and would distort the match result calculated. A list of such words (for example '(UNLOCKED)') are added to a dictionary called Exclude. When checking the occurrence of a word in the item description of an item from vendor 2, the check is forced to positive if the word occurs in the dictionary.

    If an item of vendor 2 is matched to an item of vendor 1 with a match percentage that is higher than a previous match of the vendor 2 item, the previous match is removed (indicated by an "R" in the left arrow column) and the new match is recorded. A second matching run is attempted to match all R(etry) entries.

    You can run the routine as many times as you want while tweaking the below parameters:

    Please Login or Register  to view this content.
    The result worksheet will be complety rebuild from scratch every time you run routine isolateCheapestOffer

    That is it in a nutshell

  20. #20
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Hi Omer,

    herewith a new version. I found some mistakes in the matching process. success rate has improved
    Attached Files Attached Files

  21. #21
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    Dear OllieB,

    I have to say - I'm lost on how to do this on my own. The new sheet has improved results - like really the results have improved.

    The code that you provided (please ignore the simplicity of my question); does that go next to each item?

    Column A
    Apple iPhone 5 16GB Mobile Phones

    Column B
    Your given parameters?

    I guess what I'm asking is how do I do it? (not a spoon feed) but where does everything go.

    Apologies for asking a very very basic question.

    Omer

  22. #22
    Forum Expert OllieB's Avatar
    Join Date
    12-20-2012
    Location
    Netherlands
    MS-Off Ver
    Excel 2007 (home) & 2010 (office)
    Posts
    1,542

    Re: Two Excel Sheet Comparison Scraping

    Hi Omer,

    From your post I conclude that you are under the impression the solution is something like a formula, it is not...it is a macro!

    When you open the VBA-editor (Alt-F11) and look in module 1 you can see the code provided.

    To use the solution:
    1. copy & paste data into sheet 1 and sheet 2 (same data and columns as used in your example);
    2. on the Developer tab click Macros;
    3. in the window shown select "isolateCheapestOffer" and click on the Run button

    that is it....

    I suggest you leave the parameters as they are now. I you do want to try, look for the following lines in the code and change the values shown (25 and 51 in the version delivered) to the value you desire and re-run the macro

    Please Login or Register  to view this content.
    Let me know if I have misunderstood your question and you were looking for a different answer

  23. #23
    Registered User
    Join Date
    09-25-2013
    Location
    UAE
    MS-Off Ver
    Excel 2003
    Posts
    11

    Re: Two Excel Sheet Comparison Scraping

    OllieB,

    Apologies I couldn't back to you - I'm yet to try the solution in VBA (as you had prescribed) - but only when I recover from the seasonal cold & flu

    Also it will give me a chance to try on the newly updated sheet.

    Omer

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Scraping Data from Web Into Excel Using VBA
    By gurs in forum Excel Programming / VBA / Macros
    Replies: 16
    Last Post: 01-31-2015, 06:26 PM
  2. Two excel sheet comparison and show the result in a new sheet
    By Chito in forum Excel Programming / VBA / Macros
    Replies: 14
    Last Post: 08-30-2012, 06:31 AM
  3. Comparison of 2 versions of Excel sheet
    By NM123 in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 11-15-2011, 06:34 AM
  4. Excel Testing Sheet Comparison
    By ryanb4614 in forum Excel General
    Replies: 4
    Last Post: 08-12-2010, 08:55 PM
  5. Comparison of 2 excel sheets and put ouput in a 3rd sheet
    By Hanson in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 07-27-2006, 12:35 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1