+ Reply to Thread
Results 1 to 11 of 11

Index Match vs D Function to summarize Data

  1. #1
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11

    Index Match vs D Function to summarize Data

    Hello,
    I have a project that has several worksheets of "summarized" data all pulled from one worksheet. The data worksheet is run through an external program and pasted into the workbook to update the information.

    To this point I have used tons of index match array formula's to pull the information but I've found this to be extremely inefficient because there are hundreds if not thousands of formulas, and it takes several minutes to update. The file is also around 3MB which is way too big.

    I was thinking about using VBA to simplify the worksheets, but then I came across an article about D_Functions and I'm thinking they may be the solution to my problems.

    However there is one catch...I'm not sure how I would build these formulas to work in my worksheets. I understand the principle behind the D_Functions, but I can't seem to grasp the concept of integration and making it work in my scenario.

    I've attached an example of this workbook, it has most of the forumla's deleted out to reduce the file size, but it should give you an idea of what I'm doing. Once you open it you'll have to unhide some columns to get to the nitty gritty of the sheet.

    Feel free to take a gander...I'm open to any suggestions you may have at this point (except to trash the whole project hehehehe ). If you find a solution please explain the concept behind it. I really want to learn how to make this work rather than plugging in someone elses formulas.

    Thanks in advance.
    Attached Files Attached Files
    Last edited by VBA Noob; 10-25-2008 at 02:16 AM.

  2. #2
    Forum Moderator - RIP Richard Buttrey's Avatar
    Join Date
    01-14-2008
    Location
    Stockton Heath, Cheshire, UK
    MS-Off Ver
    Office 365, Excel for Windows 2010 & Excel for Mac
    Posts
    29,464
    Hi,

    I know this is not answering your specific question, but were you using the Index() function in many formulae in the same row and/or column?

    If so you will find that you can speed things up considerably by adding an extra helper column (or row) which contains the index function, then refer to that Index value in the helper column (or row) in your formulae.

    I had exactly the same problem several years ago. Incidentally 3 Mb is not necessarily way too big. One of my workbooks I wrote for a previous employer was regularly over 40 Mb. It used the Index() function in the manner described above along with many Vlookups()

    HTH

  3. #3
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11
    Most of the formula's are almost identical...so yeah I do have a lot of index formulas in each column.

    I may try this and see if it helps. I think my calculation problems may be due to the fact that I'm using pretty much all Array Formula's. My understanding is that array formulas, when used in mass, slow things down considerably.

  4. #4
    Registered User
    Join Date
    10-25-2008
    Location
    Vancouver
    Posts
    18
    try to use manual calculations instead of using calculate automatically.
    =If("Y"& ou & " fall into love with me", "Send" & Me & "an Email", indirect("$C$100 & match("your heart", offset('[Canada.xls]BC'!$at$20, Row(int(rand(street)*100)),column(avenue),,3000),))

  5. #5
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531
    EDIT: most of the points I make about your file you yourself make -- I'm just elaborating on them ;-)

    I've just taken a quick look at this... firstly I'd advise making sure everything's visible when you post up -- people won't want to faff about unhiding columns to see your formulae ;-)

    As previously stated firstly always try to avoid recalculating the same thing over and over... the best way to explain it is using a simple example of creating a running total in Col B based on values in A... assuming data started in Row 2 -- if in each row of B you had the formula:

    =SUM($A$2:$A2)

    so in B10 that reads:

    =SUM($A$2:$A10)

    it implies your constantly recalculating the preceeding range...
    it is obviously much more efficient to use the preceeding running total and just add current value, eg:

    =N($B1)+$A2

    such that row 10 becomes

    =N($B9)+$A10

    does that make sense ?

    The main point I would make having looked at your file is that you're using a lot of Array formulae to essentially do multi conditional SUMIF / COUNTIFs etc... this is common though most people would use SUMPRODUCT (little more robust in terms of user error).... the greater in number you use arrays and the larger the array dimension the greater the impact on the performance of your file as a whole. I would nearly always advocate the use of helper cells to avoid the requirement for array formulae if you are otherwise required to have a lot of them (ie a few arrays is no big deal, lots of them is).

    You can often (not always) use CONCATENATION in helper cells to rid the need for SUMPRODUCT/array approach.

    For ex... let's say I had a table of 3 columns A:C - let's call it TABLE1
    In A I have names
    In B I have depts
    In C I have values
    The values in this table can repeat -- ie lots of the same A:B combination ... let's say the table has 1000 rows of data.

    In another summary table (let's call in TABLE2) I want to list in A:C
    A: name
    B: dept
    C: sum of values

    Now the most common approach to produce C in TABLE2 is to use a SUMPRODUCT / Array approach as (pre XL07) I can't SUMIF with 1+ criteria, so I'd have

    =SUMPRODUCT(--(Table1!$A$1:$A$1000=$A1),--(Table1!$B$1:$B$1000=$B1),Table1!$C$1:$C$1000)

    Now if I only had 1 of the above that would be so bad... if I have lots and lots (say for 100 combinations of A & B) this would not be so good...

    What I would advocate is... add a column to Table1 (D) that concatenates the important bits of info you need, in this case A:B, so Table1!D1:D1000 would have the following:

    =$A1&":"&$B1

    Now what this means is that in Table2 column C I no longer need to use a SUMPRODUCT/Array approach as I can now use a SUMIF

    =SUMIF(Table1!$D$1:$D$1000,$A1&":"&$B1,Table1!$C$1:$C$1000)

    The above formula a 100 times is nowhere near as big a deal as 100 SUMPRODUCTS/Arrays.

    I hope this makes sense...

    Sometimes (if not most of the time) the less elegant approach can be more efficient... sad but true.

    Also, if you're not aware be wary of using Volatile functions in big files -- for more info on volatile functions/actions see:

    http://www.decisionmodels.com/calcsecretsi.htm

    Your file implies to me you know what you're doing so if this sounds condescending it's not meant to... others may read and find useful who may not be at your level.

    Cheers
    Last edited by DonkeyOte; 10-26-2008 at 04:30 AM.

  6. #6
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11
    Your file implies to me you know what you're doing so if this sounds condescending it's not meant to
    No offense taken...I am open for any and all suggestions and input. I'm in no way an expert so I try to learn all I can. I will try some of your suggestions and see how it comes out, I think I get what you're saying.



    try to use manual calculations instead of using calculate automatically.
    I've tried this, but the problem I've run into is when I give the file to others to use it seems to go back to automatic. I've been trying to figure out a vba procedure to turn off calculation then add a macro attached to a button to run the calculation, but have been unsuccessful so far.




    If so you will find that you can speed things up considerably by adding an extra helper column (or row) which contains the index function, then refer to that Index value in the helper column (or row) in your formulae
    I will give this a shot as well, I want to try a couple things to see what works best.


    Also, does anyone have any good information or a good tutorial on using D_Functions? I think the DGET function could help me as well, but I'm not sure how to integrate it.

  7. #7
    Forum Expert
    Join Date
    01-03-2006
    Location
    Waikato, New Zealand
    MS-Off Ver
    2010 @ work & 2007 @ home
    Posts
    2,243
    Quote Originally Posted by ChromeDome View Post
    Quote Originally Posted by MMULT View Post
    try to use manual calculations instead of using calculate automatically.
    I've tried this, but the problem I've run into is when I give the file to others to use it seems to go back to automatic. I've been trying to figure out a vba procedure to turn off calculation then add a macro attached to a button to run the calculation, but have been unsuccessful so far.
    I recommend not using manual calculation especially on files that are being distributed to other users - at some point it is highly likely that someone will forget to turn the calculation back on & then there is huge potential/risk for inaccurate data to be relied on...
    (My understanding is that Excel's calculation mode is initially set by the first workbook that is opened & I think it can is saved with each workbook that is saved while the calculation is set to manual.)


    Also, I've only had a quick glance at the file but is there anyway that you could whip up a Pivot Table (PT) that does what you need?
    I'm guessing it's unlikely because of the range of combinations & your desired/current layout but I've attached an example* which you can have a play with...

    *I've deleted some data & some of the other sheets because it wasn't until after I uploaded that I realised that attached xls files can be larger than 100kb (when I'd already zipped it to minimise the attachment size).

    Note: if you were to use a PT approach, I suggest 1) using a dynamic named range as the PT source, 2) basing sucessive PT's off the original PT cache, & 3) putting some sort of macro in the file to reformat the column widths as needed after the PT's are updated.

    hth
    Rob
    Attached Files Attached Files
    Rob Brockett
    Kiwi in the UK
    Always learning & the best way to learn is to experience...

  8. #8
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11

    Thumbs up Success At Last

    Thank you all so much for your help. I've tried a couple of your suggestions and seem to be successful, however I do have some other questions.

    Sometimes (if not most of the time) the less elegant approach can be more efficient... sad but true.
    This certainly proved to be true, I was able to successfully implement the Concatenated SUMIF formula into my file and not only did it speed up calculation drastically...to a tune of 6-10 minutes before the changes and less than 1 minute after...It also allowed me to eliminate more than half of my formula's cutting the file size by more than half!

    So now I have an additional column on my ONHAND worksheet (which is where the data pull is input). This column concatenates the org number, subinventory, and part number with this formula:

    =$A2&":"&$F2&":"&$D2

    and gives me a result such as this when data is entered

    863:WHS1:126459

    Then on my summary sheets I use something like the below formula to give me the sum of the quantity in all rows that match that org:sub:part number

    =SUMIF('ON HAND'!$J$2:$J$2509,$C$4&":"&$C$5&":"&$A37,ONHAND_QTY)

    (i'm going to make the concatenated column on the onhand sheet a dynamic range so i can refer to it by name rather than always typing it in)

    This works perfectly. So here's the deal.

    Each column on the onhand worksheet needs to be dynamic because for some people the data pull may be 500 rows, for others it could be 5000 rows. In order to concatenate the information in the rows i have to enter the concatenation formula into each cell of the concatenation column.

    So how do I have the worksheet concatenate the information (or have the formula present) only if there is data in the other 9 columns? meaning if 10 rows have information, only 10 rows contain concatenated data in the concatenation column, or if 2000 rows have information all 2000 rows have the concatenated column filled in.

    The only way I can think of to do this is to actually copy the formula down several thousand rows to make sure nobody enters more information than there are concatenated formulas.

    My second question is...due to the nature of the SUMIF formula, if no information is found it returns a zero. How can I make excel show nothing in those fields rather than a zero? I could simply use conditional formating and say that if the cell is equal to 0 then the font is white....but you'll notice most sheets have every other line colored in to make reading the info less streneous.

    Any Thoughts?
    Attached Files Attached Files

  9. #9
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531
    Quote Originally Posted by ChromeDome View Post
    ...I was able to successfully implement the Concatenated SUMIF formula into my file and not only did it speed up calculation drastically...to a tune of 6-10 minutes before the changes and less than 1 minute after...It also allowed me to eliminate more than half of my formula's cutting the file size by more than half!
    Excellent news.... I've had similar results myself in the past so am always keen for other people to utilise the same approach... you discover the SUMPRODUCT / Array formula and think "wow! this is going to be great for my big spreadsheet!" without really realising the impact that a LOT of these formulae will have on performance... it's happened to us all I think.

    Each column on the onhand worksheet needs to be dynamic because for some people the data pull may be 500 rows, for others it could be 5000 rows....

    So how do I have the worksheet concatenate the information (or have the formula present) only if there is data in the other 9 columns? meaning if 10 rows have information, only 10 rows contain concatenated data in the concatenation column, or if 2000 rows have information all 2000 rows have the concatenated column filled in.
    Question -- the variable number of rows is controlled by what ? Manual user input / imported data ?

    The only way I can think of to do this is to actually copy the formula down several thousand rows to make sure nobody enters more information than there are concatenated formulas.
    This is technically true and the best way to do this, however, you could use a VBA Worksheet_Change event on your sheet such that when any value is added/removed from the relevant range (A,D,F) the formula is either added or removed automatically -- this would though of course require users to enable Macros. To reiterate though IMHO storing x 1000 Concatentation formulas is not a huge deal.

    You could think about having a warning message display in Excel if the number of entries exceeds number of concatenation formulae:

    =IF(MAX(COUNTA(A2:A10000),COUNTA(D2:D10000),COUNTA(F2:F10000))>COUNTA(X2:X10000),"Not enough Concatenation Formulae!","")

    (Where X holds concatenation formulae)

    Though I would always just opt to store as many concatenations as I thought was realistically required (ie build in significant excess capacity)

    My second question is...due to the nature of the SUMIF formula, if no information is found it returns a zero. How can I make excel show nothing in those fields rather than a zero? I could simply use conditional formating and say that if the cell is equal to 0 then the font is white....but you'll notice most sheets have every other line colored in to make reading the info less streneous.

    Any Thoughts?
    Well here you have 2 obvious choices:

    1 - double evaluate the SUMIF such that if on first evaluation answer is 0 return "" else recalculate the SUMIF again to return the answer... not great.

    2 - use formatting

    You mention 2 already by setting up a conditional format rule and white font... technically you don't need to do that as you can just create a custom format of:

    #,##0.00;[red](#,##0.00);-

    The above creates a format to 2 decimal places with , separator where negatives are encased in () and formatted red... zeroes are merely displayed as a hyphen. You could of course remove the hyphen and just have blank for 0.

    Custom Formats in general follow the rule of:

    positive number format; negative number format; zero format

    I hope the above helps.

    Post back with more questions.

    If you already use VBA in your file then your concatenation issue can be addressed dynamically... out of interest what is the performance impact you see of having x 1000 concatenation formulae ?
    Last edited by DonkeyOte; 11-01-2008 at 04:20 AM.

  10. #10
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11
    The above creates a format to 2 decimal places with , separator where negatives are encased in () and formatted red... zeroes are merely displayed as a hyphen. You could of course remove the hyphen and just have blank for 0.
    Ya know I had learned this quite a while back in a class but it completely slipped my mind...Of course, it worked perfectly. So that problem is solved!

    Question -- the variable number of rows is controlled by what ? Manual user input / imported data ?
    Users will click on the button that say's collect data...It takes them to a reporting data harvester that pulls the data for the user. When the report is done running it copies all the data onto the clipboard. Then the user goes back into this workbook and clicks the Paste button (which selects selects cell A1 and pastes the data, keeping the user from pasting data in the wrong place).

    So the # of rows is simply determined by how much data is being copied and pasted into the workbook.


    If you already use VBA in your file then your concatenation issue can be addressed dynamically... out of interest what is the performance impact you see of having x 1000 concatenation formulae ?
    I really need something like 10,000 concatenated formulae to make sure nobody goes over. Under a normal situation my data pull may produce up to 7000 rows of data, one of my counterparts could have a data pull of only 1800 rows of data, and another could have 9000 rows of data.

    I have tried the option of putting in 10,000 concatenation formulae, but it seems to have increased the file size by 400kb...and as I add more capability and function to the workbook it could add even more.

    I'm interested in the VBA option, however I'm very novice with VBA and am just learning it at this point so I don't know how I would go about making it dynamic using VBA.

    -- this would though of course require users to enable Macros.
    No problemo on this, we use various macro's every day so anyone who would be using this will already have macro's enabled.

    Thanks again for all your help and guidance

  11. #11
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531
    I guess I'm still intrigued by the data retrieval itself as this really is the key piece of the jigsaw as you can use this to trigger the concatenation formulae insertion etc...:

    Then the user goes back into this workbook and clicks the Paste button (which selects selects cell A1 and pastes the data, keeping the user from pasting data in the wrong place).
    So question regards the above is: what is the "paste button" ? Given it forces A1 insertion I'm assuming this is coded else how do you prevent user from activating say cell A10 and pressing paste ?

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1