+ Reply to Thread
Results 1 to 4 of 4

Detect matching values in very large lists

  1. #1
    Forum Contributor
    Join Date
    04-03-2007
    Location
    Auckland, New Zealand
    MS-Off Ver
    2007
    Posts
    137

    Detect matching values in very large lists

    Hi,

    I'm trying match values (and set a Yes / No result) from values in two very large lists.

    List 1 (approx 170,000 rows) contains the 'Find What' values
    List 2 (approx 980,000 rows) contains the 'In What' values

    Values in list 1 will be unique in list 2, but not all values in list 1 will appear in list 2.

    The values in each list are all 16 character stings

    This is a one-off otherwise I'd probably import into a AccessDB

    Using formulas is taking an absolute age to calculate using MATCH function, so I'm wondering if coding is the better angle.

    Any ideas/insight/methods welcome.

    (Using XL07, on Core2Duo)
    Last edited by Jbentley; 12-09-2009 at 04:21 AM. Reason: Additional info to aid reviewers

  2. #2
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & 2010
    Posts
    23,258

    Re: Detect matching values in very large lists

    Hello JBentley,

    You should provide a representative sample of the data. That can then be used to create a macro solution. Remember to remove any confidential or sensitive information before posting the data. You should also include a copy in Excel 2003. The more people who can view your code, the better.

  3. #3
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Essex, UK
    MS-Off Ver
    O365
    Posts
    21,528

    Re: Detect matching values in very large lists

    Sorting your "look in" data allows you to use much more efficient approaches.

    If you sort List 2 in Ascending order you will find that using:

    List1
    =IF(INDEX(Sheet2!A:A,MATCH(A1,Sheet2!A:A))=A1,"yes","no")

    will work much much faster than any unsorted approach.

    Why ?

    Well, with data sorted you can use MATCH with return_type of 1 (default) which employs the Binary Search algorithm - when the return_type is 0 the MATCH has to conduct an exact match which means it has to look in each cell in the range - with Binary Search given the data is sorted it need only check a much smaller subset of records - thus on large datasets the differences are immense.

    Here even though you're conducting an exact match we can do that by using INDEX to return the value generated by the MATCH and compare that to the original - if the same then "yes" else "no" - this is still much quicker than using

    =MATCH(A1,Sheet2!A:A,0)

    I should add I tested the above approach on data sets of your size and the sheet calculated in a few seconds.
    Last edited by DonkeyOte; 12-09-2009 at 03:51 AM. Reason: removed guff on Binary Search as explained better by others on the net if OP wants to investigate further

  4. #4
    Forum Contributor
    Join Date
    04-03-2007
    Location
    Auckland, New Zealand
    MS-Off Ver
    2007
    Posts
    137

    Re: Detect matching values in very large lists

    Hi,

    @Leith - Good advice, thankyou, I normally would upload a representative desensitised sample, but I wasn't confident that a sample (when tested for calculation speed) would reflect what I might see IRL.

    @DonkeyOte - Thanks, your solution was perfect. Difference in calculation is incredible, from over 1/2 hour down to 3 seconds (guessing - the hourglass barely flickered).

    Justin
    Last edited by Jbentley; 12-09-2009 at 04:20 AM. Reason: atrocious spelling

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1