+ Reply to Thread
Results 1 to 33 of 33

searching a large database with a long list of search terms

  1. #1

    searching a large database with a long list of search terms

    I have a worksheet with a column with about 3000 rows of info in it. I
    also have six other worksheets completely full (65536 each) that I need
    to search through a column and then when I find a row that matches an
    entry in one of those rows paste that row next to the correct number in
    the 3000 entries. I don't know how to write macros, only simple
    formulas. Is there an easy way to do this?

    This is confusing so as an example here's the one 3000 row worksheet

    aaa bob 123
    bbb june 345
    ccc fred 876
    ddd mary 765

    and I want to find all the values in the first colum (aaa, bbb, ccc,
    ddd) that show up in here (each of the 65536 row worksheets)...

    ddd toronto
    zzz chicago
    aaa new york
    mmm boise
    bbb portland
    ddd miami

    and end up with something that looks like this:

    aaa bob 123 new york
    bbb june 345 portland
    ccc fred 876 NO ENTRY
    ddd mary 765 miami

    Except that I need to do this with !hundreds of thousands! of rows so
    it can't take a super long time. Notice that when it couldn't find a
    matching entry it put "NO ENTRY" in there. That's important because
    there might be instances where the search term doesn't show up.

    Thanks so much folks, I really appreciate it.

    Joe


  2. #2
    David
    Guest

    RE: searching a large database with a long list of search terms

    Hi,
    Just a little clarification for myself and others that may look at this. The
    sheet with 3000 entries, these are unique entries? And this same sheet is
    where you want to write to? Just to add a term for clarification, this would
    be the "main" sheet. You would want to look up all entries from this "main"
    sheet and find them on the other 6 sheets, where you would be fetching back
    to the "main" sheet the city?

    The other 6 sheets, where the lookup is taking place, there are not
    duplicate lookup values, ie aaa, bbb, ccc would only have a single entry
    somewhere on the six other sheets and only one city associated with each
    lookup value?
    --
    David


    "[email protected]" wrote:

    > I have a worksheet with a column with about 3000 rows of info in it. I
    > also have six other worksheets completely full (65536 each) that I need
    > to search through a column and then when I find a row that matches an
    > entry in one of those rows paste that row next to the correct number in
    > the 3000 entries. I don't know how to write macros, only simple
    > formulas. Is there an easy way to do this?
    >
    > This is confusing so as an example here's the one 3000 row worksheet
    >
    > aaa bob 123
    > bbb june 345
    > ccc fred 876
    > ddd mary 765
    >
    > and I want to find all the values in the first colum (aaa, bbb, ccc,
    > ddd) that show up in here (each of the 65536 row worksheets)...
    >
    > ddd toronto
    > zzz chicago
    > aaa new york
    > mmm boise
    > bbb portland
    > ddd miami
    >
    > and end up with something that looks like this:
    >
    > aaa bob 123 new york
    > bbb june 345 portland
    > ccc fred 876 NO ENTRY
    > ddd mary 765 miami
    >
    > Except that I need to do this with !hundreds of thousands! of rows so
    > it can't take a super long time. Notice that when it couldn't find a
    > matching entry it put "NO ENTRY" in there. That's important because
    > there might be instances where the search term doesn't show up.
    >
    > Thanks so much folks, I really appreciate it.
    >
    > Joe
    >
    >


  3. #3

    Re: searching a large database with a long list of search terms

    Yes, these are unique. They appear (most of them anyway) in the other
    sheet just as you described.

    I messed around with using the advanced filter function and was able to
    filter the results but I wasn't able to copy over the data back to the
    "main" sheet. Plus I had to do each individual worksheet by itself and
    then copy all the results back to the main sheet (and then they weren't
    tied in with the original unique data).

    Joe


  4. #4
    Pete_UK
    Guest

    Re: searching a large database with a long list of search terms

    Joe,

    In your example you have the code "ddd" twice - I'm assuming this is a
    typo, and have changed it to "eee" below.

    Is the data on the 6 sheets sorted in some way, and if not can it be
    sorted by the first column? You example data would then look like this:

    aaa new york
    bbb portland
    ddd toronto
    eee miami
    mmm boise
    zzz chicago

    This will make searching through the data to find a match much quicker.

    Pete


  5. #5
    Max
    Guest

    Re: searching a large database with a long list of search terms

    Perhaps one play to try ..

    Sample construct available at:
    http://cjoint.com/?bfl6QazB5P
    VLookUp_6Sheets_joe_d_builder.xls

    Assume data in the 6 sheets are in cols A and B, from row1 down
    (key col = col A, "city" in col B)

    Rename the 6 sheets to be simply the numbers: 1,2,3,4,5,6
    (The renaming of the sheetnames to the numbers 1 - 6
    is to allow us to easily fill the extract formulas in Master)

    Then in sheet: Master
    where the data is in cols A to C, with the key col = col A
    > aaa bob 123
    > bbb june 345

    etc

    Put in D1, copy across 6 cols to I1, fill down as far as required:
    =IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
    ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))

    Cols D to I will extract the "city" returns from each of the 6 sheets (1 -
    6)
    [ Col D = returns from sheet: 1, .. col I = returns from sheet: 6 ]
    Unmatched cases will return "NO ENTRY"
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  6. #6

    Re: searching a large database with a long list of search terms

    Thanks so much. Let me give this a try.

    Joe


  7. #7
    Max
    Guest

    Re: searching a large database with a long list of search terms

    You're welcome, Joe.
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --
    <[email protected]> wrote in message
    news:[email protected]...
    > Thanks so much. Let me give this a try.
    >
    > Joe
    >




  8. #8
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    Max, I'm wondering if you did a full application test on this. (I didn't
    look at your sample file)

    6 sheets with 65536 rows of data, 3000 rows of lookup values and 6 columns
    of formulas.

    What kind of calc time did that take?

    Biff

    "Max" <[email protected]> wrote in message
    news:%[email protected]...
    > Perhaps one play to try ..
    >
    > Sample construct available at:
    > http://cjoint.com/?bfl6QazB5P
    > VLookUp_6Sheets_joe_d_builder.xls
    >
    > Assume data in the 6 sheets are in cols A and B, from row1 down
    > (key col = col A, "city" in col B)
    >
    > Rename the 6 sheets to be simply the numbers: 1,2,3,4,5,6
    > (The renaming of the sheetnames to the numbers 1 - 6
    > is to allow us to easily fill the extract formulas in Master)
    >
    > Then in sheet: Master
    > where the data is in cols A to C, with the key col = col A
    >> aaa bob 123
    >> bbb june 345

    > etc
    >
    > Put in D1, copy across 6 cols to I1, fill down as far as required:
    > =IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
    > ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))
    >
    > Cols D to I will extract the "city" returns from each of the 6 sheets (1 -
    > 6)
    > [ Col D = returns from sheet: 1, .. col I = returns from sheet: 6 ]
    > Unmatched cases will return "NO ENTRY"
    > --
    > Rgds
    > Max
    > xl 97
    > ---
    > Singapore, GMT+8
    > xdemechanik
    > http://savefile.com/projects/236895
    > --
    >
    >




  9. #9
    Max
    Guest

    Re: searching a large database with a long list of search terms

    "Biff" wrote:
    > Max, I'm wondering if you did a full application test on this ..


    Obviously not <g> .

    Btw, do you do full testing on detailed posts
    like this one before you respond ?

    > 6 sheets with 65536 rows of data, 3000 rows of lookup values
    > and 6 columns of formulas.
    > What kind of calc time did that take?


    Ok, I just did that, on my laptop (3 year old IBM T30) Excel 97:
    Took about 3 mins to fill the formulas & complete calc.

    Perhaps you would like to indicate what other pre-emptive caveats
    should have been written in my suggestion ?

    Like always, it was only a suggestion for the OP to try out.

    And nothing is ever stated in my posts that precludes
    any others from posting other, possibly better suggestions
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  10. #10
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    Easy there, Max!

    I'm not "knocking" anything.

    It was just a question out of my own curiosity.

    > Btw, do you do full testing on detailed posts
    > like this one before you respond ?


    Yes, I do. That's one of the reasons I didn't want to "tackle" this one! I
    didn't want to fill 6 sheets with data! I have a custom toolbar button with
    macro that generates random numeric values that I use for testing in "huge"
    blocks of cells. I need to get something that does the same thing but
    generates random text strings.

    Biff

    "Max" <[email protected]> wrote in message
    news:ODo6%[email protected]...
    > "Biff" wrote:
    >> Max, I'm wondering if you did a full application test on this ..

    >
    > Obviously not <g> .
    >
    > Btw, do you do full testing on detailed posts
    > like this one before you respond ?
    >
    >> 6 sheets with 65536 rows of data, 3000 rows of lookup values
    >> and 6 columns of formulas.
    >> What kind of calc time did that take?

    >
    > Ok, I just did that, on my laptop (3 year old IBM T30) Excel 97:
    > Took about 3 mins to fill the formulas & complete calc.
    >
    > Perhaps you would like to indicate what other pre-emptive caveats
    > should have been written in my suggestion ?
    >
    > Like always, it was only a suggestion for the OP to try out.
    >
    > And nothing is ever stated in my posts that precludes
    > any others from posting other, possibly better suggestions
    > --
    > Rgds
    > Max
    > xl 97
    > ---
    > Singapore, GMT+8
    > xdemechanik
    > http://savefile.com/projects/236895
    > --
    >
    >




  11. #11
    Max
    Guest

    Re: searching a large database with a long list of search terms

    > Easy there, Max!
    > I'm not "knocking" anything.
    > It was just a question out of my own curiosity


    No prob, guess sometimes it's good to discuss things
    (and to clarify any possible "unwritten" underlyings)

    > > Btw, do you do full testing on detailed posts
    > > like this one before you respond ?

    > Yes, I do.


    My sincere compliments, Biff !
    I don't think I can/be able to measure up to this standard. <g>

    > .. That's one of the reasons I didn't want to "tackle" this one! ...


    Ok, here's where my thoughts may differ slightly from yours. If imo,
    there's a possible way (known to me) to suggest for the OP to try and get it
    done w/o too much trouble (read: a formula, or a couple of formulas easily
    propagated across/down) which hasn't yet been posted in responses to the OP
    at that time, then I'll probably plunge right-in & suggest (after some light
    testing, of course).

    I'm not sure, but if I'm the OP, I'd definitely appreciate/prefer to receive
    and try out any ideas/suggestions than not to receive any response to try.
    Whether the suggestion(s) ultimately work or not when plugged into the
    "real-world" application, of course, is another matter which could be
    followed-up in further feedback within the thread or as a fresh post. Just
    my views ..

    > .. I need to get something that does the same thing but
    > generates random text strings.


    Probably known to you? since you were in this googled discussion
    back in 2003 <g>: http://tinyurl.com/d82tf

    Perhaps the Sub MakeRandPasswords() posted by Greg Wilson
    therein seems quite a neat way (imo) to get it up ?

    I tinkered around a little with Greg's code:

    'Dim i As Integer, ii As Integer, PW As String < change to Long
    Dim i As Long, ii As Long, PW As String
    ....
    For i = 1 To 65000 'Change to select number of passwords to generate

    and ran the sub with say, B1 selected. Greg's sub generated the full 65K
    random strings in col B within 15 sec. Cheers.
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  12. #12
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    >I'm not sure, but if I'm the OP, I'd definitely appreciate/prefer to
    >receive
    >and try out any ideas/suggestions than not to receive any response to try.
    >Whether the suggestion(s) ultimately work or not when plugged into the
    >"real-world" application, of course, is another matter which could be
    >followed-up in further feedback within the thread or as a fresh post. Just
    >my views ..


    I agree completely.

    Thanks for the text string pointers. I'll definitely check those out!

    Biff

    "Max" <[email protected]> wrote in message
    news:%[email protected]...
    >> Easy there, Max!
    >> I'm not "knocking" anything.
    >> It was just a question out of my own curiosity

    >
    > No prob, guess sometimes it's good to discuss things
    > (and to clarify any possible "unwritten" underlyings)
    >
    >> > Btw, do you do full testing on detailed posts
    >> > like this one before you respond ?

    >> Yes, I do.

    >
    > My sincere compliments, Biff !
    > I don't think I can/be able to measure up to this standard. <g>
    >
    >> .. That's one of the reasons I didn't want to "tackle" this one! ...

    >
    > Ok, here's where my thoughts may differ slightly from yours. If imo,
    > there's a possible way (known to me) to suggest for the OP to try and get
    > it
    > done w/o too much trouble (read: a formula, or a couple of formulas easily
    > propagated across/down) which hasn't yet been posted in responses to the
    > OP
    > at that time, then I'll probably plunge right-in & suggest (after some
    > light
    > testing, of course).
    >
    > I'm not sure, but if I'm the OP, I'd definitely appreciate/prefer to
    > receive
    > and try out any ideas/suggestions than not to receive any response to try.
    > Whether the suggestion(s) ultimately work or not when plugged into the
    > "real-world" application, of course, is another matter which could be
    > followed-up in further feedback within the thread or as a fresh post. Just
    > my views ..
    >
    >> .. I need to get something that does the same thing but
    >> generates random text strings.

    >
    > Probably known to you? since you were in this googled discussion
    > back in 2003 <g>: http://tinyurl.com/d82tf
    >
    > Perhaps the Sub MakeRandPasswords() posted by Greg Wilson
    > therein seems quite a neat way (imo) to get it up ?
    >
    > I tinkered around a little with Greg's code:
    >
    > 'Dim i As Integer, ii As Integer, PW As String < change to Long
    > Dim i As Long, ii As Long, PW As String
    > ...
    > For i = 1 To 65000 'Change to select number of passwords to generate
    >
    > and ran the sub with say, B1 selected. Greg's sub generated the full 65K
    > random strings in col B within 15 sec. Cheers.
    > --
    > Rgds
    > Max
    > xl 97
    > ---
    > Singapore, GMT+8
    > xdemechanik
    > http://savefile.com/projects/236895
    > --
    >
    >




  13. #13
    Pete_UK
    Guest

    Re: searching a large database with a long list of search terms

    Max,

    The OP didn't get back to me when I asked about sorting the reference
    data beforehand. Can you sort your random data in the 6 sheets then
    re-apply your formula to take advantage of this to see if there is a
    big increase in speed? In theory, the binary search technique applied
    if the data is sorted should make a massive difference to 6 * 65536
    entries.

    Pete


  14. #14
    Max
    Guest

    Re: searching a large database with a long list of search terms

    It wasn't exactly a controlled experiment earlier, Pete <g>
    I didn't save the testfile, and think I might have probably underestimated
    the timing a little. I set the calc mode to manual, filled the data to 65k,
    then filled the formulas to 3k, and left for an oxygen break. I ended up
    with a double. When I came back, the fills and calcs were done. Probably
    better to await the OP's feedback on whether it worked out ok for him over
    there.
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --
    "Pete_UK" <[email protected]> wrote in message
    news:[email protected]...
    > Max,
    >
    > The OP didn't get back to me when I asked about sorting the reference
    > data beforehand. Can you sort your random data in the 6 sheets then
    > re-apply your formula to take advantage of this to see if there is a
    > big increase in speed? In theory, the binary search technique applied
    > if the data is sorted should make a massive difference to 6 * 65536
    > entries.
    >
    > Pete
    >




  15. #15
    Max
    Guest

    Re: searching a large database with a long list of search terms

    oops, line:
    > ... filled the formulas to 3k, and left for an oxygen break ..


    should have read:
    > .. filled the formulas to 3k, pressed F9, and left for an oxygen break ..


    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  16. #16
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    If anyone is still following this thread I'll do some tests and post the
    results. Stay tuned!

    Biff

    "Pete_UK" <[email protected]> wrote in message
    news:[email protected]...
    > Max,
    >
    > The OP didn't get back to me when I asked about sorting the reference
    > data beforehand. Can you sort your random data in the 6 sheets then
    > re-apply your formula to take advantage of this to see if there is a
    > big increase in speed? In theory, the binary search technique applied
    > if the data is sorted should make a massive difference to 6 * 65536
    > entries.
    >
    > Pete
    >




  17. #17
    Max
    Guest

    Re: searching a large database with a long list of search terms

    Biff,

    What would have been your suggestion to the OP,
    had not the "volume" got in the way ?
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  18. #18
    Max
    Guest

    Re: searching a large database with a long list of search terms

    > .. Let me give this a try.

    Joe,

    Could you drop a line or two here
    on how the try went for you over there ?

    Did it work when you applied it on a copy of your actual file?
    Roughly how long did the calcs take to complete?

    Thanks
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  19. #19
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    Here are the results of 3 tests:

    Computer specs:

    Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all patches),
    Excel 2002 (XP) (all service packs)
    Other than the operating system, Excel is the only app running.

    File configuration: (based on the OPs description)

    7 sheets total, 1 summary, 6 data

    Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A, A1:A3000
    Data sheets(6): 2 columns x 65536 rows

    Test 1 (based on the reply from Max)

    File size (wo/formulas) - 26.6 Mb
    File size (w/formulas) - 27.5 Mb

    This formula was copied to 6 columns x 3000 rows:

    =IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
    ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))

    I was unable to copy/drag in a single operation. When I tried, Excel
    "froze-up". I had to use Task Manager to regain control. Tried twice and
    Excel "froze" both times. I had to drag copy in increments of ~200 rows at a
    time. I didn't time this but to copy to all 3000 rows took at least an hour.
    (calculation was on automatic) After all formulas were copied:

    Calc time (data sheets unsorted) ~6:45 (m:ss)
    Calc time (data sheets sorted ascending) ~3:30 (m:ss)

    Test 2

    Deleted all the above formulas, reset the used range.

    Used this array formula copied to 1 column x 3000 rows:

    =IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH(TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)>0,0))&"'!A:B"),2,0),"")

    After all formulas were copied:

    File size (w/formulas) - 27.2 Mb
    Calc time (data sheets sorted ascending) ~3:35 (m:ss)

    Test 3

    Deleted all the above formulas. Decided to try a monster nested IF formula
    but I hit the nested function limit so I split the formula into 2 cells. I
    cell formula did the lookup on sheets 2,3,4. The other cell formula did the
    lookup on sheets 5,6,7.

    =IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOKUP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sheet4!A:B,2,0),"")))

    =IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOOKUP(A1,Sheet7!A:B,2,0),"No
    Entry"))),"")

    After all formulas were copied:

    File size (w/formulas) - 28.2 Mb
    Calc time (data sheets sorted ascending) ~1 second

    I did not test using unsorted data sheets in tests 2 and 3.

    Conclusion:

    Sorting the data can speed up calc time significantly in "large" files. The
    use of 1000's of volatile functions should be avoided at all costs! Monster
    formulas aren't all bad!

    Comments/suggestions welcome!

    Biff

    "Biff" <[email protected]> wrote in message
    news:%[email protected]...
    > If anyone is still following this thread I'll do some tests and post the
    > results. Stay tuned!
    >
    > Biff
    >
    > "Pete_UK" <[email protected]> wrote in message
    > news:[email protected]...
    >> Max,
    >>
    >> The OP didn't get back to me when I asked about sorting the reference
    >> data beforehand. Can you sort your random data in the 6 sheets then
    >> re-apply your formula to take advantage of this to see if there is a
    >> big increase in speed? In theory, the binary search technique applied
    >> if the data is sorted should make a massive difference to 6 * 65536
    >> entries.
    >>
    >> Pete
    >>

    >
    >




  20. #20
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    See my test results.

    I probably would've gone with what I did in my test 2.

    As you can see, the calc times were pretty much the same (what I would've
    done versus what you did). I don't like to use a lot of helper cells if I
    don't have to. The monster formula was the hands-down winner, but who
    "likes" monster formulas? They tend to scare people away!

    Biff

    "Max" <[email protected]> wrote in message
    news:[email protected]...
    > Biff,
    >
    > What would have been your suggestion to the OP,
    > had not the "volume" got in the way ?
    > --
    > Rgds
    > Max
    > xl 97
    > ---
    > Singapore, GMT+8
    > xdemechanik
    > http://savefile.com/projects/236895
    > --
    >
    >




  21. #21
    Max
    Guest

    Re: searching a large database with a long list of search terms

    > Comments/suggestions welcome!

    My compliments on your effort, Biff. Amazing.

    > Sorting the data can speed up calc time significantly in "large" files.
    > The use of 1000's of volatile functions should be avoided at all costs!
    > Monster formulas aren't all bad!


    Looks like what I suggested should hence be trashed permanently <g>
    Better to have 6 similar formulas pointing to each of the 6 sheets

    > I was unable to copy/drag in a single operation.


    FWIW, my m/c was able to fill 6 C x 3000 R, over here.
    (no freezing)

    Btw, think there was a slight interp difference in that I presumed what the
    OP would like to have was to match & extract separate returns from each of
    the 6 sheets for all the 3000 lines. Then he could assess further what he
    wants done.
    I didn't assume any "precedence" order in the 6 data sheets.

    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  22. #22
    Max
    Guest

    Re: searching a large database with a long list of search terms

    > I probably would've gone with what I did in my test 2.

    ... the super complex array <g> ?
    filled down only in 1 col x 3000 rows
    (and which also contains the same volatile INDIRECT ?)

    2 observations:

    1. The returns are different, ref explanation in my response to you earlier
    (pasted below), because of different interps of what the OP wanted. My
    suggestion simply lines up all the returns from the 6 sheets for each of the
    3000 lines in 6 cols. Yours return results in 1 col, with an implicit
    precedence order assumed in the 6 data sheets. Nothing wrong there, just
    different interps.

    > Btw, think there was a slight interp difference in that I presumed what

    the
    > OP would like to have was to match & extract separate returns from each of
    > the 6 sheets for all the 3000 lines. Then he could assess further what he
    > wants done.
    > I didn't assume any "precedence" order in the 6 data sheets.


    > As you can see, the calc times were pretty much the same ..


    2. Wondering whether the calc times would be any different if say, your
    array formula were to be modified to return similarly as mine the results in
    6 cols ?

    Just some thoughts, Biff <g>
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  23. #23
    Max
    Guest

    Re: searching a large database with a long list of search terms

    > Better to have 6 similar formulas pointing to each of the 6 sheets

    What's meant is simply ..

    In D1:
    =IF($A1="","",IF(ISNA(MATCH($A1,'1'!$A:$A,0)),"NO
    ENTRY",VLOOKUP($A1,'1'!$A:$B,2,0)))

    In E1:
    =IF($A1="","",IF(ISNA(MATCH($A1,'2'!$A:$A,0)),"NO
    ENTRY",VLOOKUP($A1,'2'!$A:$B,2,0)))

    and so on in F1 till I1 (the same formula essentially but with the
    sheetnames changed accordingly to '3', '4', '5', and '6')

    D1:I1 is copied down 3000 rows
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  24. #24
    Max
    Guest

    Re: searching a large database with a long list of search terms

    In self-retrospect, I should have suggested the foregoing simpler formulas
    (w/o the INDIRECT). It only takes less than a minute to manually edit the
    other 5 sheetnames ! urgh ..
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  25. #25
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    >1. The returns are different, ref explanation in my response to you earlier
    >(pasted below), because of different interps of what the OP wanted. My
    >suggestion simply lines up all the returns from the 6 sheets for each of
    >the
    >3000 lines in 6 cols. Yours return results in 1 col, with an implicit
    >precedence order assumed in the 6 data sheets.


    I'm not following you on this????

    My interp is that on the summary are lookup values that may or may not be on
    any one of 6 sheets. Basically, it's just a lookup across multiple sheets. I
    don't understand your use of "precedence" ???????

    > (and which also contains the same volatile INDIRECT ?)


    See my "conclusion" !

    The problem with this is convincing people (maybe even ones' self) that the
    use of a monster nested IF/VLOOKUP is the best way to go!

    Biff

    "Max" <[email protected]> wrote in message
    news:%[email protected]...
    >> I probably would've gone with what I did in my test 2.

    >
    > .. the super complex array <g> ?
    > filled down only in 1 col x 3000 rows
    > (and which also contains the same volatile INDIRECT ?)
    >
    > 2 observations:
    >
    > 1. The returns are different, ref explanation in my response to you
    > earlier
    > (pasted below), because of different interps of what the OP wanted. My
    > suggestion simply lines up all the returns from the 6 sheets for each of
    > the
    > 3000 lines in 6 cols. Yours return results in 1 col, with an implicit
    > precedence order assumed in the 6 data sheets. Nothing wrong there, just
    > different interps.
    >
    >> Btw, think there was a slight interp difference in that I presumed what

    > the
    >> OP would like to have was to match & extract separate returns from each
    >> of
    >> the 6 sheets for all the 3000 lines. Then he could assess further what
    >> he
    >> wants done.
    >> I didn't assume any "precedence" order in the 6 data sheets.

    >
    >> As you can see, the calc times were pretty much the same ..

    >
    > 2. Wondering whether the calc times would be any different if say, your
    > array formula were to be modified to return similarly as mine the results
    > in
    > 6 cols ?
    >
    > Just some thoughts, Biff <g>
    > --
    > Rgds
    > Max
    > xl 97
    > ---
    > Singapore, GMT+8
    > xdemechanik
    > http://savefile.com/projects/236895
    > --
    >
    >




  26. #26
    Roger Govier
    Guest

    Re: searching a large database with a long list of search terms

    Hi Biff

    Thank you for this mammoth effort, and for sharing the results with us.
    They make fascinating reading.
    Whilst for some while now I have tried to avoid Indirect functions when
    there are lots of formulae and/or data involved, I always used to use
    Vlookup, but more recently I have made much more use of INDEX(),
    MATCH().

    I wondered whether, with this mass of data, there would be any
    significant difference in calculation time if one used the format
    =INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
    in place of
    =VLOOKUP(A2,Sheet1!A:B,2,0)
    throughout the formulae.
    The formulae would be longer, and look more horrendous, but I wonder
    whether there would be any speed difference.

    If you had the time (and inclination) to carry out this test with the
    data you already have set up, I would be most interested to see the
    results.

    --
    Regards

    Roger Govier


    "Biff" <[email protected]> wrote in message
    news:[email protected]...
    > Here are the results of 3 tests:
    >
    > Computer specs:
    >
    > Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all
    > patches), Excel 2002 (XP) (all service packs)
    > Other than the operating system, Excel is the only app running.
    >
    > File configuration: (based on the OPs description)
    >
    > 7 sheets total, 1 summary, 6 data
    >
    > Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
    > A1:A3000
    > Data sheets(6): 2 columns x 65536 rows
    >
    > Test 1 (based on the reply from Max)
    >
    > File size (wo/formulas) - 26.6 Mb
    > File size (w/formulas) - 27.5 Mb
    >
    > This formula was copied to 6 columns x 3000 rows:
    >
    > =IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
    > ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))
    >
    > I was unable to copy/drag in a single operation. When I tried, Excel
    > "froze-up". I had to use Task Manager to regain control. Tried twice
    > and Excel "froze" both times. I had to drag copy in increments of ~200
    > rows at a time. I didn't time this but to copy to all 3000 rows took
    > at least an hour. (calculation was on automatic) After all formulas
    > were copied:
    >
    > Calc time (data sheets unsorted) ~6:45 (m:ss)
    > Calc time (data sheets sorted ascending) ~3:30 (m:ss)
    >
    > Test 2
    >
    > Deleted all the above formulas, reset the used range.
    >
    > Used this array formula copied to 1 column x 3000 rows:
    >
    > =IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH(TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)>0,0))&"'!A:B"),2,0),"")
    >
    > After all formulas were copied:
    >
    > File size (w/formulas) - 27.2 Mb
    > Calc time (data sheets sorted ascending) ~3:35 (m:ss)
    >
    > Test 3
    >
    > Deleted all the above formulas. Decided to try a monster nested IF
    > formula but I hit the nested function limit so I split the formula
    > into 2 cells. I cell formula did the lookup on sheets 2,3,4. The other
    > cell formula did the lookup on sheets 5,6,7.
    >
    > =IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOKUP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sheet4!A:B,2,0),"")))
    >
    > =IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOOKUP(A1,Sheet7!A:B,2,0),"No
    > Entry"))),"")
    >
    > After all formulas were copied:
    >
    > File size (w/formulas) - 28.2 Mb
    > Calc time (data sheets sorted ascending) ~1 second
    >
    > I did not test using unsorted data sheets in tests 2 and 3.
    >
    > Conclusion:
    >
    > Sorting the data can speed up calc time significantly in "large"
    > files. The use of 1000's of volatile functions should be avoided at
    > all costs! Monster formulas aren't all bad!
    >
    > Comments/suggestions welcome!
    >
    > Biff
    >
    > "Biff" <[email protected]> wrote in message
    > news:%[email protected]...
    >> If anyone is still following this thread I'll do some tests and post
    >> the results. Stay tuned!
    >>
    >> Biff
    >>
    >> "Pete_UK" <[email protected]> wrote in message
    >> news:[email protected]...
    >>> Max,
    >>>
    >>> The OP didn't get back to me when I asked about sorting the
    >>> reference
    >>> data beforehand. Can you sort your random data in the 6 sheets then
    >>> re-apply your formula to take advantage of this to see if there is a
    >>> big increase in speed? In theory, the binary search technique
    >>> applied
    >>> if the data is sorted should make a massive difference to 6 * 65536
    >>> entries.
    >>>
    >>> Pete
    >>>

    >>
    >>

    >
    >




  27. #27
    Max
    Guest

    Re: searching a large database with a long list of search terms

    > My interp is that on the summary are lookup values
    > that may or may not be on any one of 6 sheets.
    > Basically, it's just a lookup across multiple sheets. I
    > don't understand your use of "precedence" ???????


    My presumption was that there could be multiple "city" returns for the same
    lookup value in col A within the 6 sheets. And that the OP might want to
    see all of it before deciding next steps.

    An illustration ..
    For eg: for "aaa", there could be the data
    for "aaa" in sheets: 1,2,3,4 such as:

    aaa new york (in sheet: 1)
    aaa chicago (in sheet: 2)
    aaa miami (in sheet: 3)
    aaa houston (in sheet: 4)
    [ No "aaa" in sheets 5 & 6 (say) ]

    Using nested IF(ISNA(VLOOKUP1),IF(ISNA(VLOOKUP2), ... ), or, your array
    formula would return only the "1st" matching value, depending on how the
    nested "IF(ISNA(VLOOKUP.." is structured (i.e. the "precedence" order: Check
    sheet: 1 first, then check sheet; 2, then sheet: 3, and so on). Or, in your
    array, depending on the order that the sheets are listed in WSList. If I
    list: 1 as the 1st sheet (at the top in WSList), it returns: new york.
    Change the top to: 2, it'll return: chicago. "3" will return: miami. And so
    on.
    (Btw, it was a nice array, Biff.)

    Hope the above clarifies a little better what I meant by "precedence".
    Just slightly different interps on the OP's needs, nothing wrong either way.
    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  28. #28
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    >Hope the above clarifies a little better what I meant by "precedence".

    OK, now I see.

    Biff

    "Max" <[email protected]> wrote in message
    news:OSD7vK%[email protected]...
    >> My interp is that on the summary are lookup values
    >> that may or may not be on any one of 6 sheets.
    >> Basically, it's just a lookup across multiple sheets. I
    >> don't understand your use of "precedence" ???????

    >
    > My presumption was that there could be multiple "city" returns for the
    > same
    > lookup value in col A within the 6 sheets. And that the OP might want to
    > see all of it before deciding next steps.
    >
    > An illustration ..
    > For eg: for "aaa", there could be the data
    > for "aaa" in sheets: 1,2,3,4 such as:
    >
    > aaa new york (in sheet: 1)
    > aaa chicago (in sheet: 2)
    > aaa miami (in sheet: 3)
    > aaa houston (in sheet: 4)
    > [ No "aaa" in sheets 5 & 6 (say) ]
    >
    > Using nested IF(ISNA(VLOOKUP1),IF(ISNA(VLOOKUP2), ... ), or, your array
    > formula would return only the "1st" matching value, depending on how the
    > nested "IF(ISNA(VLOOKUP.." is structured (i.e. the "precedence" order:
    > Check
    > sheet: 1 first, then check sheet; 2, then sheet: 3, and so on). Or, in
    > your
    > array, depending on the order that the sheets are listed in WSList. If I
    > list: 1 as the 1st sheet (at the top in WSList), it returns: new york.
    > Change the top to: 2, it'll return: chicago. "3" will return: miami. And
    > so
    > on.
    > (Btw, it was a nice array, Biff.)
    >
    > Hope the above clarifies a little better what I meant by "precedence".
    > Just slightly different interps on the OP's needs, nothing wrong either
    > way.
    > --
    > Rgds
    > Max
    > xl 97
    > ---
    > Singapore, GMT+8
    > xdemechanik
    > http://savefile.com/projects/236895
    > --
    >
    >




  29. #29
    Pete_UK
    Guest

    Re: searching a large database with a long list of search terms

    Biff,

    my congratulations also on a mammoth effort!

    My interpretation of the OP's original request was that he wanted one
    result - I had imagined that he had one massive lookup table (of 393000
    rows) which had to be split into 6 because of Excel's row limit of 64k.
    If this were the case, I'm not sure if he (or you or Max) ensured that
    there were no duplicates between the sheets.

    Very interesting - well done!

    Pete


  30. #30
    Max
    Guest

    Re: searching a large database with a long list of search terms

    > ... I'm not sure if he (or you or Max) ensured that
    > there were no duplicates between the sheets.


    Pete,

    I followed the OP's response (below)
    to David (the first responder) as the "state-of-events":

    David asked:
    > .. The other 6 sheets, where the lookup is taking place, there are not
    > duplicate lookup values, ie aaa, bbb, ccc would only have a single entry
    > somewhere on the six other sheets and only one city associated with each
    > lookup value?


    OP's reply to David:
    > Yes, these are unique.


    Also, as clarified in my responses since:

    > .. I presumed what the OP would like to have
    > was to match & extract separate returns from each of the 6 sheets
    > for all the 3000 lines. Then he could assess further what he wants done.


    > My suggestion simply lines up all the returns from the 6 sheets
    > for each of the 3000 lines in 6 cols.

    --
    Rgds
    Max
    xl 97
    ---
    Singapore, GMT+8
    xdemechanik
    http://savefile.com/projects/236895
    --



  31. #31
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    Roger,

    I still have my test files. I'll tinker around with this sometime Sunday
    night.

    The thing that surprised me the most was the file size. I knew it would be
    big but did not think it would be 27 - 28 Mb big!

    Biff

    "Roger Govier" <[email protected]> wrote in message
    news:[email protected]...
    > Hi Biff
    >
    > Thank you for this mammoth effort, and for sharing the results with us.
    > They make fascinating reading.
    > Whilst for some while now I have tried to avoid Indirect functions when
    > there are lots of formulae and/or data involved, I always used to use
    > Vlookup, but more recently I have made much more use of INDEX(), MATCH().
    >
    > I wondered whether, with this mass of data, there would be any significant
    > difference in calculation time if one used the format
    > =INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
    > in place of
    > =VLOOKUP(A2,Sheet1!A:B,2,0)
    > throughout the formulae.
    > The formulae would be longer, and look more horrendous, but I wonder
    > whether there would be any speed difference.
    >
    > If you had the time (and inclination) to carry out this test with the data
    > you already have set up, I would be most interested to see the results.
    >
    > --
    > Regards
    >
    > Roger Govier
    >
    >
    > "Biff" <[email protected]> wrote in message
    > news:[email protected]...
    >> Here are the results of 3 tests:
    >>
    >> Computer specs:
    >>
    >> Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all patches),
    >> Excel 2002 (XP) (all service packs)
    >> Other than the operating system, Excel is the only app running.
    >>
    >> File configuration: (based on the OPs description)
    >>
    >> 7 sheets total, 1 summary, 6 data
    >>
    >> Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
    >> A1:A3000
    >> Data sheets(6): 2 columns x 65536 rows
    >>
    >> Test 1 (based on the reply from Max)
    >>
    >> File size (wo/formulas) - 26.6 Mb
    >> File size (w/formulas) - 27.5 Mb
    >>
    >> This formula was copied to 6 columns x 3000 rows:
    >>
    >> =IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
    >> ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))
    >>
    >> I was unable to copy/drag in a single operation. When I tried, Excel
    >> "froze-up". I had to use Task Manager to regain control. Tried twice and
    >> Excel "froze" both times. I had to drag copy in increments of ~200 rows
    >> at a time. I didn't time this but to copy to all 3000 rows took at least
    >> an hour. (calculation was on automatic) After all formulas were copied:
    >>
    >> Calc time (data sheets unsorted) ~6:45 (m:ss)
    >> Calc time (data sheets sorted ascending) ~3:30 (m:ss)
    >>
    >> Test 2
    >>
    >> Deleted all the above formulas, reset the used range.
    >>
    >> Used this array formula copied to 1 column x 3000 rows:
    >>
    >> =IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH(TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)>0,0))&"'!A:B"),2,0),"")
    >>
    >> After all formulas were copied:
    >>
    >> File size (w/formulas) - 27.2 Mb
    >> Calc time (data sheets sorted ascending) ~3:35 (m:ss)
    >>
    >> Test 3
    >>
    >> Deleted all the above formulas. Decided to try a monster nested IF
    >> formula but I hit the nested function limit so I split the formula into 2
    >> cells. I cell formula did the lookup on sheets 2,3,4. The other cell
    >> formula did the lookup on sheets 5,6,7.
    >>
    >> =IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOKUP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sheet4!A:B,2,0),"")))
    >>
    >> =IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOOKUP(A1,Sheet7!A:B,2,0),"No
    >> Entry"))),"")
    >>
    >> After all formulas were copied:
    >>
    >> File size (w/formulas) - 28.2 Mb
    >> Calc time (data sheets sorted ascending) ~1 second
    >>
    >> I did not test using unsorted data sheets in tests 2 and 3.
    >>
    >> Conclusion:
    >>
    >> Sorting the data can speed up calc time significantly in "large" files.
    >> The use of 1000's of volatile functions should be avoided at all costs!
    >> Monster formulas aren't all bad!
    >>
    >> Comments/suggestions welcome!
    >>
    >> Biff
    >>
    >> "Biff" <[email protected]> wrote in message
    >> news:%[email protected]...
    >>> If anyone is still following this thread I'll do some tests and post the
    >>> results. Stay tuned!
    >>>
    >>> Biff
    >>>
    >>> "Pete_UK" <[email protected]> wrote in message
    >>> news:[email protected]...
    >>>> Max,
    >>>>
    >>>> The OP didn't get back to me when I asked about sorting the reference
    >>>> data beforehand. Can you sort your random data in the 6 sheets then
    >>>> re-apply your formula to take advantage of this to see if there is a
    >>>> big increase in speed? In theory, the binary search technique applied
    >>>> if the data is sorted should make a massive difference to 6 * 65536
    >>>> entries.
    >>>>
    >>>> Pete
    >>>>
    >>>
    >>>

    >>
    >>

    >
    >




  32. #32
    Biff
    Guest

    Re: searching a large database with a long list of search terms

    Hmmm........

    This is turning into a "major" undertaking!

    Although the calc times I reported in my other post are correct and
    "accurate", I failed to disclose that the calc times do not reflect the fact
    that no changes were made to any of the lookup table data. This means that
    the lookup formulas did not have to recalculate (excluding those with
    volatile functions) with any subsequent workbook calculations. I just
    assumed that at this point the formulas and tables would remain "static" and
    the calc times I reported were for any new data or formulas entered that
    were not associated with the lookups!

    That was a major gaff on my part!

    So, that ~1 second calc time for the monster IF/VLOOKUP only pertains to
    calc time for new data/formula entry.

    To thoroughly test these operations/concepts in this context takes a lot of
    time and patience. I think that it has already been demonstrated that under
    these parameters Excel is going to be "very" slow. For a non-static,
    actively used spreadsheet to take upwards of 3 to 4 minutes (at the minimum
    end) to calculate, is to me, unacceptable.

    I guess this would be "easier" to do if one had benchmarking software!

    And with those observations I think I'll end my "tests".

    Biff

    "Biff" <[email protected]> wrote in message
    news:[email protected]...
    > Roger,
    >
    > I still have my test files. I'll tinker around with this sometime Sunday
    > night.
    >
    > The thing that surprised me the most was the file size. I knew it would be
    > big but did not think it would be 27 - 28 Mb big!
    >
    > Biff
    >
    > "Roger Govier" <[email protected]> wrote in message
    > news:[email protected]...
    >> Hi Biff
    >>
    >> Thank you for this mammoth effort, and for sharing the results with us.
    >> They make fascinating reading.
    >> Whilst for some while now I have tried to avoid Indirect functions when
    >> there are lots of formulae and/or data involved, I always used to use
    >> Vlookup, but more recently I have made much more use of INDEX(), MATCH().
    >>
    >> I wondered whether, with this mass of data, there would be any
    >> significant difference in calculation time if one used the format
    >> =INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
    >> in place of
    >> =VLOOKUP(A2,Sheet1!A:B,2,0)
    >> throughout the formulae.
    >> The formulae would be longer, and look more horrendous, but I wonder
    >> whether there would be any speed difference.
    >>
    >> If you had the time (and inclination) to carry out this test with the
    >> data you already have set up, I would be most interested to see the
    >> results.
    >>
    >> --
    >> Regards
    >>
    >> Roger Govier
    >>
    >>
    >> "Biff" <[email protected]> wrote in message
    >> news:[email protected]...
    >>> Here are the results of 3 tests:
    >>>
    >>> Computer specs:
    >>>
    >>> Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all patches),
    >>> Excel 2002 (XP) (all service packs)
    >>> Other than the operating system, Excel is the only app running.
    >>>
    >>> File configuration: (based on the OPs description)
    >>>
    >>> 7 sheets total, 1 summary, 6 data
    >>>
    >>> Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
    >>> A1:A3000
    >>> Data sheets(6): 2 columns x 65536 rows
    >>>
    >>> Test 1 (based on the reply from Max)
    >>>
    >>> File size (wo/formulas) - 26.6 Mb
    >>> File size (w/formulas) - 27.5 Mb
    >>>
    >>> This formula was copied to 6 columns x 3000 rows:
    >>>
    >>> =IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
    >>> ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))
    >>>
    >>> I was unable to copy/drag in a single operation. When I tried, Excel
    >>> "froze-up". I had to use Task Manager to regain control. Tried twice and
    >>> Excel "froze" both times. I had to drag copy in increments of ~200 rows
    >>> at a time. I didn't time this but to copy to all 3000 rows took at least
    >>> an hour. (calculation was on automatic) After all formulas were copied:
    >>>
    >>> Calc time (data sheets unsorted) ~6:45 (m:ss)
    >>> Calc time (data sheets sorted ascending) ~3:30 (m:ss)
    >>>
    >>> Test 2
    >>>
    >>> Deleted all the above formulas, reset the used range.
    >>>
    >>> Used this array formula copied to 1 column x 3000 rows:
    >>>
    >>> =IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH(TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)>0,0))&"'!A:B"),2,0),"")
    >>>
    >>> After all formulas were copied:
    >>>
    >>> File size (w/formulas) - 27.2 Mb
    >>> Calc time (data sheets sorted ascending) ~3:35 (m:ss)
    >>>
    >>> Test 3
    >>>
    >>> Deleted all the above formulas. Decided to try a monster nested IF
    >>> formula but I hit the nested function limit so I split the formula into
    >>> 2 cells. I cell formula did the lookup on sheets 2,3,4. The other cell
    >>> formula did the lookup on sheets 5,6,7.
    >>>
    >>> =IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOKUP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sheet4!A:B,2,0),"")))
    >>>
    >>> =IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOOKUP(A1,Sheet7!A:B,2,0),"No
    >>> Entry"))),"")
    >>>
    >>> After all formulas were copied:
    >>>
    >>> File size (w/formulas) - 28.2 Mb
    >>> Calc time (data sheets sorted ascending) ~1 second
    >>>
    >>> I did not test using unsorted data sheets in tests 2 and 3.
    >>>
    >>> Conclusion:
    >>>
    >>> Sorting the data can speed up calc time significantly in "large" files.
    >>> The use of 1000's of volatile functions should be avoided at all costs!
    >>> Monster formulas aren't all bad!
    >>>
    >>> Comments/suggestions welcome!
    >>>
    >>> Biff
    >>>
    >>> "Biff" <[email protected]> wrote in message
    >>> news:%[email protected]...
    >>>> If anyone is still following this thread I'll do some tests and post
    >>>> the results. Stay tuned!
    >>>>
    >>>> Biff
    >>>>
    >>>> "Pete_UK" <[email protected]> wrote in message
    >>>> news:[email protected]...
    >>>>> Max,
    >>>>>
    >>>>> The OP didn't get back to me when I asked about sorting the reference
    >>>>> data beforehand. Can you sort your random data in the 6 sheets then
    >>>>> re-apply your formula to take advantage of this to see if there is a
    >>>>> big increase in speed? In theory, the binary search technique applied
    >>>>> if the data is sorted should make a massive difference to 6 * 65536
    >>>>> entries.
    >>>>>
    >>>>> Pete
    >>>>>
    >>>>
    >>>>
    >>>
    >>>

    >>
    >>

    >
    >




  33. #33
    Roger Govier
    Guest

    Re: searching a large database with a long list of search terms

    Hi Biff

    Thank you for your efforts and observations.
    I must admit I was very surprised at the ~ 1 second time, especially for
    such a large file size. Thankfully I have never had to deal with
    anything of that size but one client's data file does grow to around 9
    MB by the end of each year, and that is where I am always looking to see
    if I can make changes to improve the existing performance, which does at
    times border on the limits of patience!!!

    --
    Regards

    Roger Govier


    "Biff" <[email protected]> wrote in message
    news:[email protected]...
    > Hmmm........
    >
    > This is turning into a "major" undertaking!
    >
    > Although the calc times I reported in my other post are correct and
    > "accurate", I failed to disclose that the calc times do not reflect
    > the fact that no changes were made to any of the lookup table data.
    > This means that the lookup formulas did not have to recalculate
    > (excluding those with volatile functions) with any subsequent workbook
    > calculations. I just assumed that at this point the formulas and
    > tables would remain "static" and the calc times I reported were for
    > any new data or formulas entered that were not associated with the
    > lookups!
    >
    > That was a major gaff on my part!
    >
    > So, that ~1 second calc time for the monster IF/VLOOKUP only pertains
    > to calc time for new data/formula entry.
    >
    > To thoroughly test these operations/concepts in this context takes a
    > lot of time and patience. I think that it has already been
    > demonstrated that under these parameters Excel is going to be "very"
    > slow. For a non-static, actively used spreadsheet to take upwards of
    > 3 to 4 minutes (at the minimum end) to calculate, is to me,
    > unacceptable.
    >
    > I guess this would be "easier" to do if one had benchmarking software!
    >
    > And with those observations I think I'll end my "tests".
    >
    > Biff
    >
    > "Biff" <[email protected]> wrote in message
    > news:[email protected]...
    >> Roger,
    >>
    >> I still have my test files. I'll tinker around with this sometime
    >> Sunday night.
    >>
    >> The thing that surprised me the most was the file size. I knew it
    >> would be big but did not think it would be 27 - 28 Mb big!
    >>
    >> Biff
    >>
    >> "Roger Govier" <[email protected]> wrote in message
    >> news:[email protected]...
    >>> Hi Biff
    >>>
    >>> Thank you for this mammoth effort, and for sharing the results with
    >>> us. They make fascinating reading.
    >>> Whilst for some while now I have tried to avoid Indirect functions
    >>> when there are lots of formulae and/or data involved, I always used
    >>> to use Vlookup, but more recently I have made much more use of
    >>> INDEX(), MATCH().
    >>>
    >>> I wondered whether, with this mass of data, there would be any
    >>> significant difference in calculation time if one used the format
    >>> =INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
    >>> in place of
    >>> =VLOOKUP(A2,Sheet1!A:B,2,0)
    >>> throughout the formulae.
    >>> The formulae would be longer, and look more horrendous, but I wonder
    >>> whether there would be any speed difference.
    >>>
    >>> If you had the time (and inclination) to carry out this test with
    >>> the data you already have set up, I would be most interested to see
    >>> the results.
    >>>
    >>> --
    >>> Regards
    >>>
    >>> Roger Govier
    >>>
    >>>
    >>> "Biff" <[email protected]> wrote in message
    >>> news:[email protected]...
    >>>> Here are the results of 3 tests:
    >>>>
    >>>> Computer specs:
    >>>>
    >>>> Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all
    >>>> patches), Excel 2002 (XP) (all service packs)
    >>>> Other than the operating system, Excel is the only app running.
    >>>>
    >>>> File configuration: (based on the OPs description)
    >>>>
    >>>> 7 sheets total, 1 summary, 6 data
    >>>>
    >>>> Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
    >>>> A1:A3000
    >>>> Data sheets(6): 2 columns x 65536 rows
    >>>>
    >>>> Test 1 (based on the reply from Max)
    >>>>
    >>>> File size (wo/formulas) - 26.6 Mb
    >>>> File size (w/formulas) - 27.5 Mb
    >>>>
    >>>> This formula was copied to 6 columns x 3000 rows:
    >>>>
    >>>> =IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
    >>>> ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))
    >>>>
    >>>> I was unable to copy/drag in a single operation. When I tried,
    >>>> Excel "froze-up". I had to use Task Manager to regain control.
    >>>> Tried twice and Excel "froze" both times. I had to drag copy in
    >>>> increments of ~200 rows at a time. I didn't time this but to copy
    >>>> to all 3000 rows took at least an hour. (calculation was on
    >>>> automatic) After all formulas were copied:
    >>>>
    >>>> Calc time (data sheets unsorted) ~6:45 (m:ss)
    >>>> Calc time (data sheets sorted ascending) ~3:30 (m:ss)
    >>>>
    >>>> Test 2
    >>>>
    >>>> Deleted all the above formulas, reset the used range.
    >>>>
    >>>> Used this array formula copied to 1 column x 3000 rows:
    >>>>
    >>>> =IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH(TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)>0,0))&"'!A:B"),2,0),"")
    >>>>
    >>>> After all formulas were copied:
    >>>>
    >>>> File size (w/formulas) - 27.2 Mb
    >>>> Calc time (data sheets sorted ascending) ~3:35 (m:ss)
    >>>>
    >>>> Test 3
    >>>>
    >>>> Deleted all the above formulas. Decided to try a monster nested IF
    >>>> formula but I hit the nested function limit so I split the formula
    >>>> into 2 cells. I cell formula did the lookup on sheets 2,3,4. The
    >>>> other cell formula did the lookup on sheets 5,6,7.
    >>>>
    >>>> =IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOKUP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sheet4!A:B,2,0),"")))
    >>>>
    >>>> =IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOOKUP(A1,Sheet7!A:B,2,0),"No
    >>>> Entry"))),"")
    >>>>
    >>>> After all formulas were copied:
    >>>>
    >>>> File size (w/formulas) - 28.2 Mb
    >>>> Calc time (data sheets sorted ascending) ~1 second
    >>>>
    >>>> I did not test using unsorted data sheets in tests 2 and 3.
    >>>>
    >>>> Conclusion:
    >>>>
    >>>> Sorting the data can speed up calc time significantly in "large"
    >>>> files. The use of 1000's of volatile functions should be avoided at
    >>>> all costs! Monster formulas aren't all bad!
    >>>>
    >>>> Comments/suggestions welcome!
    >>>>
    >>>> Biff
    >>>>
    >>>> "Biff" <[email protected]> wrote in message
    >>>> news:%[email protected]...
    >>>>> If anyone is still following this thread I'll do some tests and
    >>>>> post the results. Stay tuned!
    >>>>>
    >>>>> Biff
    >>>>>
    >>>>> "Pete_UK" <[email protected]> wrote in message
    >>>>> news:[email protected]...
    >>>>>> Max,
    >>>>>>
    >>>>>> The OP didn't get back to me when I asked about sorting the
    >>>>>> reference
    >>>>>> data beforehand. Can you sort your random data in the 6 sheets
    >>>>>> then
    >>>>>> re-apply your formula to take advantage of this to see if there
    >>>>>> is a
    >>>>>> big increase in speed? In theory, the binary search technique
    >>>>>> applied
    >>>>>> if the data is sorted should make a massive difference to 6 *
    >>>>>> 65536
    >>>>>> entries.
    >>>>>>
    >>>>>> Pete
    >>>>>>
    >>>>>
    >>>>>
    >>>>
    >>>>
    >>>
    >>>

    >>
    >>

    >
    >




+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1