+ Reply to Thread
Results 1 to 13 of 13

What is the fastest way to search an internal array?

  1. #1
    Forum Contributor
    Join Date
    10-13-2012
    Location
    Southern California
    MS-Off Ver
    Excel 2007
    Posts
    401

    What is the fastest way to search an internal array?

    I have an array that contains 267,751 items. (It's the sowpods word list.) I populate the array by reading it from a text file. I need to search this array to see if a user-supplied permutation of letters is or is not a valid word.

    I initially suspected Excel's own built-in functions would result in the fastest way to search it. And I came across this neat piece of code that's just one line long, using FILTER.

    Please Login or Register  to view this content.
    Very short and simple. And it works great. All search strings that are valid words are indicated as such, and all search strings that are not words return a value of 0.

    However, it's terribly, terribly slow. Searching for just 5,000 possible words took almost 30 seconds.


    I then tried the Excel VLOOKUP function. First question... does VLOOKUP work with internal arrays? I couldn't find a valid example and I couldn't get it to work myself. And so then I decided to simply list the contents of the array in Column A (267,751 rows) and use VLOOKUP with this column. (The normal way you might use VLOOKUP.)

    The results also proved terribly slow.


    MATCH wasn't much better. Using the MATCH function and using the list of words in Column A, MATCH was able to better the results of VLOOKUP by the tiniest of margins.


    Because the array is already sorted, this gave me the idea to try a binary search. And so I wrote a little algorithm to do just that.

    I remember a piece of code I wrote many years ago that sped up a binary search because you can pinpoint where to start looking.

    When I read in the array initially, from the word list, I take a moment and index the first few letters. I'll give a quick example of what I mean. The word "spectacular" is at position 219,425 in the sowpods word list (and in my array). Normally you would just start at the middle of the array (position 133,876) and check to see if the word your searching is equal to, greater than, or less than the word at this position. If it's greater than, as in this case, you then split the difference and search again. And so my next search would be at index 200,813 (133,876 + 267,751) / 2 = 200,813.

    And so on and so on. Eventually you either find the word (which doesn't take as many comparisons as you might think, but if you're familiar with binary searching you know this) or you've determined it doesn't exist in the word list at all.

    By indexing the first few letters, I'm able to start my search much closer to where the word will be. (Again, if it's there at all.)

    For example, the first "spe" word in the word list is the word "speak." I know from my index that the first "spe" word is at position 219,267.

    The next indexed word is the word "sphacelate." (Because it's the first word after "speak that doesn't begin with "spe.") This is at position 219,831. If "spectacular" is going to be in this list, it must be in this range, and I begin looking for it at position 219,549. (219,267 + 219,831) / 2 = 219,549.

    And I discovered that starting at position 219,549 and working from there, instead of starting at position 133,876 (the middle of the array) proved to be faster. (I don't recall how much faster. I do recall trying to improve the speed further and indexing the fourth letter of each word, instead of just the first three letters, but that gave me diminishing returns.)


    Anyway, the difference with my own binary sort algorithm and Excel's FILTER function and Excel's VLOOKUP function and Excel's MATCH function was like night and day.

    Night and day.

    Using my binary search algorithm, I can search for 267,751 different items in less than 2 seconds. (I test the algorithm by looking for every word in the array, one at a time, just to prove the algorithm does find it.)

    Compare this to the times mentioned above:

    FILTER = 5,000 different words searched and found in 29 seconds. (Did not want to test 50,000 words.)
    VLOOKUP = 50,000 different words search and found in 73 seconds.
    MATCH = 50,000 different words searched and found in 69 seconds.
    own binary sort = all 267,751 words searched and found in 1.77 seconds (1,770 milliseconds)

    (The binary sort was fast enough I decided to time it in milliseconds)

    Now, that's great but can I do better? Question #2: Is there an Excel trick or function that I'm not aware of that can give me even better results? It just seems to me Excel should have some kind of a built in function or trick that can search faster and out-perform a user algorithm.

    (By the way, be careful when looking for the words "false" and "true" with a VLOOKUP function. )

  2. #2
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: What is the fastest way to search an internal array?

    I came across this neat piece of code that's just one line long, using FILTER.
    Filter does an enormous amount of work that doesn't need doing.

    does VLOOKUP work with internal arrays
    Yes.

    You could instead put the words in a dictionary. Then ...

    Please Login or Register  to view this content.
    Last edited by shg; 06-06-2015 at 02:39 PM.
    Entia non sunt multiplicanda sine necessitate

  3. #3
    Forum Contributor
    Join Date
    10-13-2012
    Location
    Southern California
    MS-Off Ver
    Excel 2007
    Posts
    401

    Re: What is the fastest way to search an internal array?

    shg, thank you. I wasn't familiar with a "scripting dictionary."

    In fact, I had to research it a bit, to determine the proper syntax to add items to my scripting dictionary.

    Alas, it's still not as fast as the binary sort. Oh, it's much faster than everything else. But conducting 267,751 different searches takes between 2,593 and 2,603 milliseconds on my machine. (Based upon a a few different trials.)

    Note that I'm not counting load time... the time it takes to initially create the dictionary. That will occur once upon startup in my program and then will be irrelevant for the rest of the run. (It was only a second or so anyway.)

    The binary sort, by comparison, not counting the initial load time, is 1,411 and 1,413 milliseconds.

    HOWEVER... drum roll please... what IS faster is dimensioning my list as a "collection" and then searching this collection. The Google page I found had a tip from a user that this was another method of searching. (I had never heard of or done that before either, obviously.)

    This method proves to be the fastest yet. 1,240 and 1,241 milliseconds on my computer, for 267,751 different searches.

    However, I still may wish to use the scripting dictionary. This page

    https://sites.google.com/site/beyond...xdocumentation

    mentions several advantages dictionaries have over collections, which requires further study on my part.


    I still think the final chapter hasn't been written. I will now go back and determine how to use VLOOKUP with an internal array but I have a suspicion it won't beat this collection code method.

    Thanks again for responding.

  4. #4
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: What is the fastest way to search an internal array?

    I very much doubt it will beat a collection, using functions on in memory arrays is slow. It's also not surprising that a collection is faster than a dictionary, it's a more simple structure. You may want to take a look at the arraylist, it's essentially a collection with a sort and binary search methods baked in - I seem to think it's very quick.

    It's s .net data structure rather than a vb6/VBA one, but if you google system.collections.arraylist you should find some examples of it being used in vba

  5. #5
    Valued Forum Contributor
    Join Date
    03-21-2013
    Location
    cyberia
    MS-Off Ver
    Excel 2007
    Posts
    457

    Re: What is the fastest way to search an internal array?

    Maybe I'm just missing something in the opening post.

    But why don't you just search the internal array directly? I see no reason in your post why it should be included as dictionary keys or whatever. In my experience the dictionary approach is fast for smaller problems but can sometimes slow appreciably when dealing with around 200,000 or more elements in an array.

    Say something like
    Please Login or Register  to view this content.
    Took me about 0.2 secs to search a list of 270,000 words.

  6. #6
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: What is the fastest way to search an internal array?

    Because that's nowhere near as fast as a binary search. The op is looking for ways to improve on his solution, and the code you posted is much, much slower. 0.2 seconds is slow per word of you need to find thousands of words

  7. #7
    Forum Contributor
    Join Date
    10-13-2012
    Location
    Southern California
    MS-Off Ver
    Excel 2007
    Posts
    401

    Re: What is the fastest way to search an internal array?

    Hi kalak,

    Thanks for responding!!!!

    No you're not missing anything. As I re-read my post, I was not as clear as I could have been. My fault.

    To clarify, I'm not just searching an array once, of 267,751 items. I'm searching the entire array 267,751 different times.

    I first search for the word aa, and then after finding it, I search for the word aah, and then aahed, and then aahing, and then aahs, etc., etc. (These are the first five words in the sowpods word list.) I then continue in this manner and search the array for every word in the word list, all the way down to the last word, zzzs.

    I do this just to double-check and verify that my code is working and that every word will indeed be found. If not, then of course something would be terribly wrong with the binary search algorithm.

    In my defense, I did say

    Using my binary search algorithm, I can search for 267,751 different items in less than 2 seconds. (I test the algorithm by looking for every word in the array, one at a time, just to prove the algorithm does find it.)

    and

    my own binary sort = all 267,751 words searched and found in 1.77 seconds (1,770 milliseconds)

    But I can see how it would be easy to miss that. My post was a bit on the lengthy side!

    (fyi... as a further test to verify everything is okay, I then run another little test and generate a few dozen or so random letter permutations all beginning with the letters QZ. There ARE no words beginning with QZ, so of course NONE of these non words should be found.)

    You code is very elegant, and it's a heck of a lot shorter than my little binary search subroutine, but it appears that's a linear search, looking at each element one at a time. And if it does take a full .2 seconds to just search for the non word "hudunnit," I calculate it would take more than 14 hours to perform this search 267,751 different times! (0.2 * 267,751 = 53,550 seconds = 892 minutes = 14.87 hours)

    So far, the binary search is the way to go, until I find something faster. Searching for 500,000 random letter permutations, most of which won't be words at all, the binary search outperforms the scripting dictionary code and the collection code.

  8. #8
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: What is the fastest way to search an internal array?

    Did you try the array list with the baked in binary search?

  9. #9
    Forum Contributor
    Join Date
    10-13-2012
    Location
    Southern California
    MS-Off Ver
    Excel 2007
    Posts
    401

    Re: What is the fastest way to search an internal array?

    Kyle, please clarify a bit further. I don't understand what you mean when you say baked in binary search?

  10. #10
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: What is the fastest way to search an internal array?

    If you read my earlier post, I point to towards using a .net collection type, you can actually access this through vba, even though it's .net, try giving it a whirl.

  11. #11
    Forum Contributor
    Join Date
    10-13-2012
    Location
    Southern California
    MS-Off Ver
    Excel 2007
    Posts
    401

    Re: What is the fastest way to search an internal array?

    Ah, yes.

    No, I haven't yet had a chance to Google it and look for examples of what it is and how exactly it works. But I will do so, later this evening after I get home from work.

    Thanks!!

  12. #12
    Forum Contributor
    Join Date
    10-13-2012
    Location
    Southern California
    MS-Off Ver
    Excel 2007
    Posts
    401

    Re: What is the fastest way to search an internal array?

    This thread

    http://www.ozgrid.com/forum/showthread.php?t=167349

    (that you participated in back in 2012)

    appears to be what you're referring to, and where I will begin my research. The very bottom post appears to be a method to initially populate the array, and a post or two before appears to show an example of how I will search it.

    Thanks. I will keep you posted.

  13. #13
    Valued Forum Contributor
    Join Date
    03-21-2013
    Location
    cyberia
    MS-Off Ver
    Excel 2007
    Posts
    457

    Re: What is the fastest way to search an internal array?

    OK Ed.
    re your post #7
    Thanks for the clarification. I didn't have a grip on just what you wanted and my own post was more to get that sorted than to make a serious coding suggestion.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Fill cell from array value. Fastest way?
    By ChrisCom in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 01-13-2015, 04:06 PM
  2. [SOLVED] Problem with array's internal order
    By GIS2013 in forum Excel Programming / VBA / Macros
    Replies: 11
    Last Post: 06-20-2014, 08:03 PM
  3. [SOLVED] fastest ways to make a search
    By yberf in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 08-16-2012, 11:08 AM
  4. [SOLVED] Fastest way to find item in an array.
    By WhytheQ in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 05-24-2006, 06:20 PM
  5. [SOLVED] fastest sorting routine for 2-D array of long values
    By RB Smissaert in forum Excel Programming / VBA / Macros
    Replies: 8
    Last Post: 05-06-2006, 12:10 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1