+ Reply to Thread
Results 1 to 11 of 11

Increase speed for match formula on over 130k rows

  1. #1
    Forum Moderator jeffreybrown's Avatar
    Join Date
    02-19-2009
    Location
    Cibolo, TX
    MS-Off Ver
    Office 365
    Posts
    10,316

    Increase speed for match formula on over 130k rows

    I have a dataset which is a little over 130k rows. Right now I run a match on the dataset against a master file (Column A) to return a true/false. All of this is in the same workbook.

    Please Login or Register  to view this content.
    If I run this formula on all recordd it takes about 20 minutes to run so I have been splittling the dataset up into smaller more manageble pieces.

    Just found out today I need to add another match and compare 2 columns (A & B) to return true/false.

    Please Login or Register  to view this content.
    Is there some way to speed this process up so I can run it on all 130k at a time?

    Also, been trying but can't seem to figure it out, but is there a way to incorporate the method were you search for where the data starts, then find the length and use all this information in an Offset construct.

    So, instead of searching all 130k records you narrow the search and now only query against maybe 10k. Also, with Offset being volatile will this just add to the processing time.

    Sample attached, but just contains a small amount of data.
    Attached Files Attached Files
    Last edited by jeffreybrown; 09-24-2010 at 08:13 PM.

  2. #2
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Increase speed for match formula on over 130k rows

    Sort the data by col A then col B, and remove the ,0 from the match function. That results in a binary search instead of a linear search, which is exponentially faster
    Entia non sunt multiplicanda sine necessitate

  3. #3
    Forum Moderator jeffreybrown's Avatar
    Join Date
    02-19-2009
    Location
    Cibolo, TX
    MS-Off Ver
    Office 365
    Posts
    10,316

    Re: Increase speed for match formula on over 130k rows

    I sorted A and then by B, removed the ,0 from the match function, but now I get true all the way down.

    Please Login or Register  to view this content.

  4. #4
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Increase speed for match formula on over 130k rows

    Ah. When you catenate A and B, the resulting strings are not in alphabetical order.

    This worked:

    =INDEX( TEXT(Data!$A$2:$A$30, "00") & Data!$B$2:$B$30, MATCH(TEXT(A2, "00") &B2, TEXT(Data!$A$2:$A$30, "00") & Data!$B$2:$B$30)) = TEXT(A2, "00") & B2

    I would not use full-column references; instead, use a dynamic named range.

    If you add a column on sheet Data and do the catenation there (=A2 & B2) , then sort ascending by that column, you get, in part,

    Please Login or Register  to view this content.
    Then you can revert the formula to,

    =INDEX( Data!$C$2:$C$30, MATCH(A2 & B2, Data!$C$2:$C$30) ) = A2 & B2

    Which returns, in part,

    Please Login or Register  to view this content.

  5. #5
    Forum Moderator jeffreybrown's Avatar
    Join Date
    02-19-2009
    Location
    Cibolo, TX
    MS-Off Ver
    Office 365
    Posts
    10,316

    Re: Increase speed for match formula on over 130k rows

    Thank you very much this seems to be working far better.

  6. #6
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Increase speed for match formula on over 130k rows

    Would you compare the performance of that to this?

    =LOOKUP(A2 & B2, Data!$C$2:$C$30) = A2 & B2
    Last edited by shg; 09-22-2010 at 10:57 AM.

  7. #7
    Forum Moderator jeffreybrown's Avatar
    Join Date
    02-19-2009
    Location
    Cibolo, TX
    MS-Off Ver
    Office 365
    Posts
    10,316

    Re: Increase speed for match formula on over 130k rows

    I sure will give it a try.

    Using the formula you provided yesterday

    =INDEX( Data!$C$2:$C$30, MATCH(A2 & B2, Data!$C$2:$C$30) ) = A2 & B2

    I was receiving some incorrect results, but only at certain times. At one point I realized I was forgetting to sort the match column so even with doing it over again I was still not coming up with a true match when there should have been one.

    In that particular example I put the ,0 back into the formula and it matched. Kind of confused me.

    With this new lookup formula will that negate the need to sort? Excel help says the lookup_vector must be placed in ascending order so I would assume this would refer to C2:C30

  8. #8
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Increase speed for match formula on over 130k rows

    I was receiving some incorrect results,
    Can you post a minimal example that illustrates?

    With this new lookup formula will that negate the need to sort?
    No! A binary search REQUIRES sorted data.

    Excel help says the lookup_vector must be placed in ascending order so I would assume this would refer to C2:C30
    Correct.

  9. #9
    Forum Moderator jeffreybrown's Avatar
    Join Date
    02-19-2009
    Location
    Cibolo, TX
    MS-Off Ver
    Office 365
    Posts
    10,316

    Re: Increase speed for match formula on over 130k rows

    I will try to post a sample of the data, but the problem is the data is on my work laptop and I don't have any internet access.

    If I can replicate the problem I will post something.

    Thanks

  10. #10
    Forum Moderator jeffreybrown's Avatar
    Join Date
    02-19-2009
    Location
    Cibolo, TX
    MS-Off Ver
    Office 365
    Posts
    10,316

    Re: Increase speed for match formula on over 130k rows

    Hi shg,

    I can't seem to duplicate any errors in the matching, but I did put together a sample to run the formulas against. If I run into something tomorrow I will try to post the particular error.

    64999 rows

    Method 1 - 4.454
    =INDEX(Data!$C$2:$C$65000,MATCH(A2&B2,Data!$C$2:$C$65000))=A2&B2

    Method 2 - 4.467
    =LOOKUP(A2&B2,Data!$C$2:$C$65000)=A2&B2

    I used the the timer from http://msdn.microsoft.com (Calculating Workbooks, Worksheets, and Ranges), but do you know of another place where I can get a time that will calculate two methods and then compares them against each other?

  11. #11
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Increase speed for match formula on over 130k rows

    Thanks for posting results. Given that they are negligibly different, I'd opt for the shorter.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1