+ Reply to Thread
Results 1 to 11 of 11

Index Match vs D Function to summarize Data

Hybrid View

  1. #1
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11

    Index Match vs D Function to summarize Data

    Hello,
    I have a project that has several worksheets of "summarized" data all pulled from one worksheet. The data worksheet is run through an external program and pasted into the workbook to update the information.

    To this point I have used tons of index match array formula's to pull the information but I've found this to be extremely inefficient because there are hundreds if not thousands of formulas, and it takes several minutes to update. The file is also around 3MB which is way too big.

    I was thinking about using VBA to simplify the worksheets, but then I came across an article about D_Functions and I'm thinking they may be the solution to my problems.

    However there is one catch...I'm not sure how I would build these formulas to work in my worksheets. I understand the principle behind the D_Functions, but I can't seem to grasp the concept of integration and making it work in my scenario.

    I've attached an example of this workbook, it has most of the forumla's deleted out to reduce the file size, but it should give you an idea of what I'm doing. Once you open it you'll have to unhide some columns to get to the nitty gritty of the sheet.

    Feel free to take a gander...I'm open to any suggestions you may have at this point (except to trash the whole project hehehehe ). If you find a solution please explain the concept behind it. I really want to learn how to make this work rather than plugging in someone elses formulas.

    Thanks in advance.
    Attached Files Attached Files
    Last edited by VBA Noob; 10-25-2008 at 02:16 AM.

  2. #2
    Forum Moderator - RIP Richard Buttrey's Avatar
    Join Date
    01-14-2008
    Location
    Stockton Heath, Cheshire, UK
    MS-Off Ver
    Office 365, Excel for Windows 2010 & Excel for Mac
    Posts
    29,464
    Hi,

    I know this is not answering your specific question, but were you using the Index() function in many formulae in the same row and/or column?

    If so you will find that you can speed things up considerably by adding an extra helper column (or row) which contains the index function, then refer to that Index value in the helper column (or row) in your formulae.

    I had exactly the same problem several years ago. Incidentally 3 Mb is not necessarily way too big. One of my workbooks I wrote for a previous employer was regularly over 40 Mb. It used the Index() function in the manner described above along with many Vlookups()

    HTH

  3. #3
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11
    Most of the formula's are almost identical...so yeah I do have a lot of index formulas in each column.

    I may try this and see if it helps. I think my calculation problems may be due to the fact that I'm using pretty much all Array Formula's. My understanding is that array formulas, when used in mass, slow things down considerably.

  4. #4
    Registered User
    Join Date
    10-25-2008
    Location
    Vancouver
    Posts
    18
    try to use manual calculations instead of using calculate automatically.
    =If("Y"& ou & " fall into love with me", "Send" & Me & "an Email", indirect("$C$100 & match("your heart", offset('[Canada.xls]BC'!$at$20, Row(int(rand(street)*100)),column(avenue),,3000),))

  5. #5
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531
    EDIT: most of the points I make about your file you yourself make -- I'm just elaborating on them ;-)

    I've just taken a quick look at this... firstly I'd advise making sure everything's visible when you post up -- people won't want to faff about unhiding columns to see your formulae ;-)

    As previously stated firstly always try to avoid recalculating the same thing over and over... the best way to explain it is using a simple example of creating a running total in Col B based on values in A... assuming data started in Row 2 -- if in each row of B you had the formula:

    =SUM($A$2:$A2)

    so in B10 that reads:

    =SUM($A$2:$A10)

    it implies your constantly recalculating the preceeding range...
    it is obviously much more efficient to use the preceeding running total and just add current value, eg:

    =N($B1)+$A2

    such that row 10 becomes

    =N($B9)+$A10

    does that make sense ?

    The main point I would make having looked at your file is that you're using a lot of Array formulae to essentially do multi conditional SUMIF / COUNTIFs etc... this is common though most people would use SUMPRODUCT (little more robust in terms of user error).... the greater in number you use arrays and the larger the array dimension the greater the impact on the performance of your file as a whole. I would nearly always advocate the use of helper cells to avoid the requirement for array formulae if you are otherwise required to have a lot of them (ie a few arrays is no big deal, lots of them is).

    You can often (not always) use CONCATENATION in helper cells to rid the need for SUMPRODUCT/array approach.

    For ex... let's say I had a table of 3 columns A:C - let's call it TABLE1
    In A I have names
    In B I have depts
    In C I have values
    The values in this table can repeat -- ie lots of the same A:B combination ... let's say the table has 1000 rows of data.

    In another summary table (let's call in TABLE2) I want to list in A:C
    A: name
    B: dept
    C: sum of values

    Now the most common approach to produce C in TABLE2 is to use a SUMPRODUCT / Array approach as (pre XL07) I can't SUMIF with 1+ criteria, so I'd have

    =SUMPRODUCT(--(Table1!$A$1:$A$1000=$A1),--(Table1!$B$1:$B$1000=$B1),Table1!$C$1:$C$1000)

    Now if I only had 1 of the above that would be so bad... if I have lots and lots (say for 100 combinations of A & B) this would not be so good...

    What I would advocate is... add a column to Table1 (D) that concatenates the important bits of info you need, in this case A:B, so Table1!D1:D1000 would have the following:

    =$A1&":"&$B1

    Now what this means is that in Table2 column C I no longer need to use a SUMPRODUCT/Array approach as I can now use a SUMIF

    =SUMIF(Table1!$D$1:$D$1000,$A1&":"&$B1,Table1!$C$1:$C$1000)

    The above formula a 100 times is nowhere near as big a deal as 100 SUMPRODUCTS/Arrays.

    I hope this makes sense...

    Sometimes (if not most of the time) the less elegant approach can be more efficient... sad but true.

    Also, if you're not aware be wary of using Volatile functions in big files -- for more info on volatile functions/actions see:

    http://www.decisionmodels.com/calcsecretsi.htm

    Your file implies to me you know what you're doing so if this sounds condescending it's not meant to... others may read and find useful who may not be at your level.

    Cheers
    Last edited by DonkeyOte; 10-26-2008 at 04:30 AM.

  6. #6
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11
    Your file implies to me you know what you're doing so if this sounds condescending it's not meant to
    No offense taken...I am open for any and all suggestions and input. I'm in no way an expert so I try to learn all I can. I will try some of your suggestions and see how it comes out, I think I get what you're saying.



    try to use manual calculations instead of using calculate automatically.
    I've tried this, but the problem I've run into is when I give the file to others to use it seems to go back to automatic. I've been trying to figure out a vba procedure to turn off calculation then add a macro attached to a button to run the calculation, but have been unsuccessful so far.




    If so you will find that you can speed things up considerably by adding an extra helper column (or row) which contains the index function, then refer to that Index value in the helper column (or row) in your formulae
    I will give this a shot as well, I want to try a couple things to see what works best.


    Also, does anyone have any good information or a good tutorial on using D_Functions? I think the DGET function could help me as well, but I'm not sure how to integrate it.

  7. #7
    Forum Expert
    Join Date
    01-03-2006
    Location
    Waikato, New Zealand
    MS-Off Ver
    2010 @ work & 2007 @ home
    Posts
    2,243
    Quote Originally Posted by ChromeDome View Post
    Quote Originally Posted by MMULT View Post
    try to use manual calculations instead of using calculate automatically.
    I've tried this, but the problem I've run into is when I give the file to others to use it seems to go back to automatic. I've been trying to figure out a vba procedure to turn off calculation then add a macro attached to a button to run the calculation, but have been unsuccessful so far.
    I recommend not using manual calculation especially on files that are being distributed to other users - at some point it is highly likely that someone will forget to turn the calculation back on & then there is huge potential/risk for inaccurate data to be relied on...
    (My understanding is that Excel's calculation mode is initially set by the first workbook that is opened & I think it can is saved with each workbook that is saved while the calculation is set to manual.)


    Also, I've only had a quick glance at the file but is there anyway that you could whip up a Pivot Table (PT) that does what you need?
    I'm guessing it's unlikely because of the range of combinations & your desired/current layout but I've attached an example* which you can have a play with...

    *I've deleted some data & some of the other sheets because it wasn't until after I uploaded that I realised that attached xls files can be larger than 100kb (when I'd already zipped it to minimise the attachment size).

    Note: if you were to use a PT approach, I suggest 1) using a dynamic named range as the PT source, 2) basing sucessive PT's off the original PT cache, & 3) putting some sort of macro in the file to reformat the column widths as needed after the PT's are updated.

    hth
    Rob
    Attached Files Attached Files
    Rob Brockett
    Kiwi in the UK
    Always learning & the best way to learn is to experience...

  8. #8
    Registered User
    Join Date
    10-07-2008
    Location
    Kansas City
    Posts
    11

    Thumbs up Success At Last

    Thank you all so much for your help. I've tried a couple of your suggestions and seem to be successful, however I do have some other questions.

    Sometimes (if not most of the time) the less elegant approach can be more efficient... sad but true.
    This certainly proved to be true, I was able to successfully implement the Concatenated SUMIF formula into my file and not only did it speed up calculation drastically...to a tune of 6-10 minutes before the changes and less than 1 minute after...It also allowed me to eliminate more than half of my formula's cutting the file size by more than half!

    So now I have an additional column on my ONHAND worksheet (which is where the data pull is input). This column concatenates the org number, subinventory, and part number with this formula:

    =$A2&":"&$F2&":"&$D2

    and gives me a result such as this when data is entered

    863:WHS1:126459

    Then on my summary sheets I use something like the below formula to give me the sum of the quantity in all rows that match that org:sub:part number

    =SUMIF('ON HAND'!$J$2:$J$2509,$C$4&":"&$C$5&":"&$A37,ONHAND_QTY)

    (i'm going to make the concatenated column on the onhand sheet a dynamic range so i can refer to it by name rather than always typing it in)

    This works perfectly. So here's the deal.

    Each column on the onhand worksheet needs to be dynamic because for some people the data pull may be 500 rows, for others it could be 5000 rows. In order to concatenate the information in the rows i have to enter the concatenation formula into each cell of the concatenation column.

    So how do I have the worksheet concatenate the information (or have the formula present) only if there is data in the other 9 columns? meaning if 10 rows have information, only 10 rows contain concatenated data in the concatenation column, or if 2000 rows have information all 2000 rows have the concatenated column filled in.

    The only way I can think of to do this is to actually copy the formula down several thousand rows to make sure nobody enters more information than there are concatenated formulas.

    My second question is...due to the nature of the SUMIF formula, if no information is found it returns a zero. How can I make excel show nothing in those fields rather than a zero? I could simply use conditional formating and say that if the cell is equal to 0 then the font is white....but you'll notice most sheets have every other line colored in to make reading the info less streneous.

    Any Thoughts?
    Attached Files Attached Files

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1