# Longer Formulas vs. More Formulas?

1. ## Longer Formulas vs. More Formulas?

I am having to use Excel to analyze big data at work (inefficient, I know, but at least I have 2 laptops). I need to run a lookup to pull order cost values into 240K rows of one sheet from roughly 1.5 mil rows of data that is separated on 3 other sheets. I also need to know the category of order I am dealing with - "Type A" or "Type B" (types mixed through 3 reference sheets). I am trying to choose the quickest (lol) method for how to handle this. With index/match being easier for me to use than vlookup (and taking less file space), I am using that for lookups:

Option 1: run a nested iferror/index/match lookup down 240K rows analyzing 1.5mil. records: (if not on sheet 1, pull from sheet 2, if not on sheet 2, pull from sheet 3). While pulled from someone else's solution, formula would look something like this (tested on small sample and does work):

=IFERROR(IFERROR(INDEX(ANGLES!\$H\$6:\$H\$85,MATCH([@StockCode],Table_Query_from_SysproCompanyA[StockCode],0)),INDEX(CHANNELS!\$H\$6:\$H\$85,MATCH([@StockCode],Table_Query_from_SysproCompanyA5[StockCode],0))),"")

Option 2: create 4 columns - 3 with index/match formulas referencing a different sheet [so 3x the formulas - each formula analyzing 500K rows instead of 1.5mil) and one with a helper formula. Each lookup line *should* return "#N/A" in 2 of the 3 columns and a sales value in another. Helper column will report the non/#N/A value via nested iferror (iferror column L, col. M, iferror col. M, col. N)

Please let me know your thoughts. I can work on my other laptop while whichever monstrosity I choose loads. Unfortunately, using Access/Tableau/something else is not an option for this project today and I do not have VBA knowledge.

2. ## Re: Longer Formulas vs. More Formulas?

Considering how much better a database program would be for this than Excel, there must be significant barriers to using a database on this project.

The first thing I notice about your sample formula is that it is using very slow linear "exact match" lookups (3rd argument in the MATCH() function is 0). If I were designing a spreadsheet to search 1.5 million records 1/4 million times, it would take threats to life, liberty, or national security for me to use linear lookups. I would do anything I could to use the more efficient binary "approximate match" lookup algorithm (3rd argument to MATCH() function is 1 or -1). With luck, that should only require sorting the database(s) on the lookup value. If you are less lucky, that will require more effort to get the database into a good format for a binary search.

As a minimalist test, I built a quick 2 million record x 1 field database of row numbers across 2 sheets (records 1 to 1 million on sheet1, records 1million to 2 million on sheet2) and searched those 2million records 1 million times.

sheet1!A1 =ROW() copied down to the bottom
sheet2!A1 =Sheet1!\$A\$1045876+ROW() copied down to the bottom
sheet3!A1 =RANDBETWEEN(1,2000000) copied down to the bottom
sheet3!B1 =IF(A1<=sheet1!\$A\$1045876,MATCH(A1,sheet1!\$A:\$A,1),MATCH(A1,sheet2!\$A:\$A)) copied down to the bottom.

After setting up (which took several seconds for each paste operation), each calculate event is about 4 seconds. Note that, where you examples are performing up to 2 (option 1) or 3 (option 2) lookups per lookup cell, my approach is only doing 1 lookup per cell. Because my data is sorted, a simple IF() check is all that is needed to determine whether to search in sheet1 or sheet2.

That's probably how I would begin to approach this. Lookups are a very common bottleneck in spreadsheet programming, so you will want to spend some time with those lookups (both database structure and lookup formula) to make your lookups as efficient as possible.

There are currently 1 users browsing this thread. (0 members and 1 guests)