+ Reply to Thread
Results 1 to 22 of 22

Compare Two String Arrays Faster (Efficiency Challenge)

  1. #1
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Lightbulb Compare Two String Arrays Faster (Efficiency Challenge)

    Hello all--

    I need help speeding up the process of comparing two string arrays in order to find matches.

    The below code works perfectly, however it is too slow. If the first array contains 5,000 strings and the second array contains 10,000 strings then the process could take well over an hour to complete (which is not good enough). My understanding of VBA programming is limited, however I do understand looping through each array by using nested loops is a common approach that works, but in this instance at least, is inefficient (maybe because it is comparing strings, or maybe an obvious rookie mistake i made?).

    Would anyone be able to take this challenge of making this "Find Matches" section faster. Thanks for taking a look.

    Note: This part of the code is the comparing/find matches section only, and is the part of my code that takes forever to complete (the declaration and array loading section of my code is not posted)

    Please Login or Register  to view this content.

  2. #2
    Forum Expert
    Join Date
    12-14-2012
    Location
    London England
    MS-Off Ver
    MS 365 Office Suite.
    Posts
    8,448

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    So you are comparing the data in two columns to find matches.

    You can do that with a simple formulas.

    So why use a vba loop?
    Last edited by mehmetcik; 03-21-2018 at 10:34 AM.
    My General Rules if you want my help. Not aimed at any person in particular:

    1. Please Make Requests not demands, none of us get paid here.

    2. Check back on your post regularly. I will not return to a post after 4 days.
    If it is not important to you then it definitely is not important to me.

  3. #3
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    I agree with the above, a formula would be your best bet.

    The issue with your code is that you aren't looping through arrays, you're looping through tens of thousands of worksheet calls, these are incredibly slow. You should be reading the lists into arrays and looping through those.

  4. #4
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    My original solution actually did use an excel formula but I ran into duplicate matches and missed matches so I abandoned the formula approach in favor of the VBA array comparison approach since I could tell it to remove any found matches from the replacement/donor array, and honestly, I had no idea it would take so long to process. Everything i read about VBA arrays led me to believe it would be extremely fast.

    I don't mind at all to try another formula approach if you believe that is the faster approach.

  5. #5
    Forum Expert
    Join Date
    12-14-2012
    Location
    London England
    MS-Off Ver
    MS 365 Office Suite.
    Posts
    8,448

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    VBA can be fast.

    Unless: 1. Your are looping around numerous times ie 5000 times
    2. You are passing data between excel and vba 5 times per loop in your case.



    Attach a sample workbook. Make sure there is just enough data to demonstrate your need. Include a BEFORE sheet and an AFTER sheet in the workbook if needed to show the process you're trying to complete or automate. Make sure your desired results are shown, mock them up manually if necessary.

    Remember to desensitize the data.

    Click on GO ADVANCED and then scroll down to Manage Attachments to open the upload window.
    Last edited by mehmetcik; 03-21-2018 at 10:42 AM.

  6. #6
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Kyle--

    Thanks for your feedback. I am looping through the arrays though (not the worksheet calls as you specify--i know that would be incredibly slow). I only reference the worksheet when a match is found, but not during my search. Unless I've made a mistake? I am for sure open to being corrected on this... I'm still learning.

    ScrapeString and ScrapeStringDONOR are my string arrays
    Please Login or Register  to view this content.

  7. #7
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    How many matches are there?

  8. #8
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Kyle-- Potentially thousands. But even when few matches are found it is taking longer than expected. I will build a mock sheet to upload hopefully within the next few hours. (i don't have one made without sensitive info yet)

  9. #9
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    If you sort one array, you could do a binary search with the other.

    If you sort both arrays, you could do something similar to how a merge sort works, only comparing the top elements of each.

    Either should be a couple orders of magnitude faster than what you have.
    Last edited by shg; 03-21-2018 at 10:53 AM.
    Entia non sunt multiplicanda sine necessitate

  10. #10
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    shg-- the strings are different length ranging from about 10 characters to sometimes over 30 characters. As for the dictionary approach, i'm open to trying, however, i am not sure the difference in an array versus a dictionary when attempting to compare them. Is there not a way to "test for Exists" with an array?

  11. #11
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Sorry for editing while you were reading. Please reread

  12. #12
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    shg-- thanks for the ideas. In my google searching, i did come across a binary search, but failed to understand how it worked. And i stopped trying since several people mentioned that sorting an array of 10,000+ would take just as long as some other methods. Maybe i should look more into binary search afterall?

  13. #13
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    I think we need to see your data and what you are actually trying to do. Solutions are always context dependant and I'm afraid we're lacking any.

  14. #14
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    And i stopped trying since several people mentioned that sorting an array of 10,000+ would take just as long as some other methods
    Sorting is fast if you only have to do it once

  15. #15
    Forum Expert
    Join Date
    10-02-2014
    Location
    USA
    MS-Off Ver
    2016
    Posts
    1,222

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Sorry if I missed something just skimmed through.

    Dump the array into a dictionary, then simply use the .exists method to check for it.

    See for example https://excelmacromastery.com/vba-di...f_a_Key_Exists

    In this fashion you could just loop 1 array, pass the element of that array to the dictionary.exists and check if that element does exist very quickly.
    Ways to get help: Post clear questions, explain your overall goal, supply as much background as possible, respond to questions asked of you by those trying to help and post sample(s) files.

    "I am here to help, not do it for people" -Me

  16. #16
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    thanks Zero-- i will look more into the dictionary approach. i appreciate the link.

    currently i am building a mock file that i can share with everyone in order to get a hands-on fix. should have it up in an hour or so hopefully.

  17. #17
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Took me a bit longer to get freed up from work in order to build a mock file. I have it attached now. There are 1000 original IDs(column A) that need to be replaced with the Donor IDs(column C) based on their corresponding string information (the string information is what we are comparing). I have placed 900 matching strings in column D so there should be 100 non-matching ID's remaining.

    For those readers starting from here: this method currently works, however, in my actual file it runs much slower, so i've been looking at ways to increase performance of the matching process.

    Here's the thing: for some reason this mock file, even tho it is built exactly like my actual file, is extremely fast, usually completing the match in under 1 second. So there must be something up with my actual file, but i wanted to go ahead and post this mock file since i spent the time to build it, and perhaps this could help someone in the future with one of their projects.

    I am also still very much open to better/smarter/faster ways of handling this matching process. thanks. -Bishop
    Attached Files Attached Files

  18. #18
    Forum Guru
    Join Date
    08-15-2004
    Location
    Tokyo, Japan
    MS-Off Ver
    2013 O.365
    Posts
    22,516

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    See if this is fast enough,
    Please Login or Register  to view this content.

  19. #19
    Forum Expert
    Join Date
    10-02-2014
    Location
    USA
    MS-Off Ver
    2016
    Posts
    1,222

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Taking a peak at your samples code before I leave at the end of my day, ill try and go over it more tomorrow.

    First off check your actual file for excess content (cells which are empty but Excel thinks there is content within). This is checked per sheet and is simple. On each sheet select cell A1, press Ctrl+Shift+End. Whats selected is your used range, ie: all the cells down to the intersection of your last used row and last used column. Ideally this should end at a point were your data stops (you actually have valid data in the last row/column. If however, for ex your content stops in Column D and your last row of data is row 10 and this selectes A1:Z1000, well you have a bunch of excess cells outside the bounds of your actual data.

    This excess content can drastically slow down the file in numerous ways. Could be the difference between your sample and actual file other then the contents.

    Also your code, some observations:

    Please Login or Register  to view this content.
    The above is likely not what you think. Al the variables in red are actually variant data types. Declaring them explicitly can have a good impact on speed (later I can dig a link to another post in which the OP's macro was ~33% faster just by declaring variables).

    Also, .select and .activate can drastically slow down code and isnt needed to act upon objects.

    Please Login or Register  to view this content.
    The above is, no offense, lazy. Instead determine the end of your data instead of assuming it goes to the end of an xls sheet. This could slow things down when most of that area isnt used but gets fed as blanks into the array later.

    Your loops for loading data into the arrays will be slow, see this post that will shed some light on converting ranges to arrays. Any loop without criteria feeding 1 element at a time into an array will be vastly slower than loading the whole range into an array.

    Consider Application.ScreenUpdating = false/true which should speed things up.

    (turning on manual calc until the end may speed it up too)

    And as I mentioned prior, storing the array into a dictionary instead will make it much faster to find what you are looking for.

    DoEvents should not be required here and could have adverse effect. Instead, surround the status bar line in an If statement to execute it only every x iterations of the loop.

    Ill see what more I can do to help tomorrow.

  20. #20
    Forum Expert
    Join Date
    10-02-2014
    Location
    USA
    MS-Off Ver
    2016
    Posts
    1,222

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Would you mind laying out the logic of how you are making matches?

    After reading the prior posts, looking more closely at the file and code its unclear to me how you determine a match.

    From what I can see it looks as if you are trying to create a 1:1 relationship when what you really have is a many:1 relationship in your data.

    In other words:

    If Originals X123457 = CT2TX3DEE50, and you look for CT2TX3DEE50 in column D, you find 219 matches (and their corresponding ID's). You cant make any assertion beyond that with the data given unless I have missed something. From your columns F and G however it seems like you are trying to show that X123457 matches Y198765...but saying so would be arbitrary, as we could pick any of the 219 matches and justifiably claim the same thing.

    As far as I see it, you do not have whats termed a "unique identifier". Meaning we have no way to identify that 1 record uniquely and solely matches another record.

    If you could walk us through what you are trying to accomplish overall (disregarding what you currently have) it would make it easier to help you. Just lay out the logic for us like:

    ex (hypothetical):
    • I have column w and look for all the matches in column x according to a, b, c
    • I then list the matching values from x into column z
    • Once complete I do (something) with the data

    Include any "gotcha's" or considerations to make with the data (uniques, what do do when no match is found, exceptions to rules, etc).

    Usually unless matching is very complex formulas are the fastest way to do it. So before I start writing code (or formulas) to approach this I want to make sure I understand what you want to accomplish so I dont waste your time or my own.

  21. #21
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Thank you all for the feedback. (i appreciate the tidbits Zer0, very helpful going forward).

    I discovered my actual file had a rookie error in it that would access the sheet unnecessarily every single loop (it was built in to my if/then statement on a Else clause). Once i removed that, it fixed almost all my issues. Although i was looping through the two arrays, once a match was found, i then outputted to the sheet, which is another slow point (thanks to Kyle for bringing my attention back to this!). I have altered the actual file to output to another added column in my array (instead of the sheet) and then added another loop (after my matching loop finished) that took care of outputting the matches found to the sheet. These two changes had massive impact on the performance. I launched the updated macro to the team and everything is running above expectations.

    I will still play with the performance based on the suggestions provided, but i will go ahead and mark this thread Solved. thank you everyone for your assistance. this is an amazing community and i very much appreciate the insight provided.

  22. #22
    Registered User
    Join Date
    02-13-2015
    Location
    Greater Nashville
    MS-Off Ver
    2016
    Posts
    19

    Re: Compare Two String Arrays Faster (Efficiency Challenge)

    Zer0--

    i must have been typing my most recent reply when you sent this message through. Thanks for your help. (also, i missed an acknowledgement for jindon's post--thank you. your code is admittedly above my limited coding skills, but i will learn more based on what you provided--so thank you very much).

    So with my actual file and the mock file i provided, you are correct Zer0, the string columns that i am comparing will have duplicates. My unique ID's are the strings corresponding X or Y values (column A and C respectively). You are also correct about the matching being arbitrary to the extent that any of those 219 matches would work (and that's perfectly okay for my needs). I simply stop at the first match it finds, and return the Unique ID that corresponds with the string found. And, most importantly, in order to prevent duplicates on the next search, i erase the string in the Donor list.
    Please Login or Register  to view this content.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Get ID more faster using arrays
    By YasserKhalil in forum Excel Programming / VBA / Macros
    Replies: 17
    Last Post: 10-16-2015, 08:50 AM
  2. Excel Arrays x Name Manager: Calculation Efficiency
    By cmore in forum Excel Programming / VBA / Macros
    Replies: 12
    Last Post: 10-07-2015, 07:36 PM
  3. Macro efficiency - Execute process faster
    By Zoediak in forum Excel Programming / VBA / Macros
    Replies: 6
    Last Post: 12-08-2014, 02:37 PM
  4. What is faster VLOOKUP or Using 2 Arrays
    By Cerbera in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 07-22-2014, 11:23 PM
  5. vba compare arrays and remove exact matching arrays
    By jacojvv in forum Excel Programming / VBA / Macros
    Replies: 15
    Last Post: 10-25-2013, 07:30 AM
  6. VBA Arrays - Faster alternative than Vlookup?
    By danmcdon in forum Excel Programming / VBA / Macros
    Replies: 15
    Last Post: 08-22-2012, 10:28 AM
  7. Working with arrays is faster but... Problem in the sheets!
    By pierre08 in forum Excel Programming / VBA / Macros
    Replies: 12
    Last Post: 05-12-2010, 09:29 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1