Whoa! good job!The last full calc of the old sheet ended up taking ~165mins...just timed the new one at 28 seconds!
EDIT: And for our listeners, thats a 300X improvemnt. Take heed.
Last edited by shg; 03-12-2010 at 01:15 AM.
Microsoft MVP - Excel
Entia non sunt multiplicanda sine necessitate
Fantastic improvement!
If you're happy with the current improvement can you please mark the thread as Solved (see rules for instructions)?
However, I'm still gunning for more improvement - you can't steal the teaser of an optimisation brain teaser that easily
It's a little late in the thread for this..., but, because I haven't dealt with many large data sets, I still think 28 seconds is a long time - although, I stand to be over-ruled!
(see http://www.decisionmodels.com/optspeed.htm).
- Are all "repetitive calculations" (mentioned by DO) split into different columns, so they are only calculated once with the result cell then being referenced in subsequent formulae?
- I may be way off the mark here, but with your use of conditional summing etc, would a pivot table approach (see Jon's site) be possible?
(even if it is just as an intermediate step)
Rob
Rob Brockett
Kiwi in the UK
Always learning & the best way to learn is to experience...
Echoing the others - top effort - and good on you for resolving under your own steam (even if we did miss out)
sad but true ... elegance <> efficiencyOriginally Posted by JonnyB
remember that though theyOriginally Posted by JonnyB
a) create reduced range references
b) improve readability (shorten syntax)
Dynamic Named Ranges do involve overheads in so far as the calculations used to generate them are recalculated each time they are used.
Note also that if they DNR's are created with OFFSET / INDIRECT etc they make all formulae utilising them Volatile by association.
If we add to the above the fact that the greater majority of formulae use only the Used Range of whatever range is specified [edit: at least that is my understanding], eg:
=SUMIF(A:A,A1,B:B)
will not process all the rows of A & B - only the used range intersect of A:B
then I would say that sometimes (ie where calculation speed is of paramount importance) the performance "cost" of using DNR's can actually outweigh the benefits previously outlined.
The most obvious exceptions to the "Used Range" rule are SUMPRODUCT and Arrays (and I suspect SUMIFS / COUNTIFS though perhaps to a lesser extent) ... in such instances the use of non-volatile Named Ranges is often a no-brainer.
Needless to say it all boils down to the context in which they're being used ie:
1. which formulae they're being used with (might not be necessary)
2. whether they're Volatile (created with OFFSET or INDEX for ex.)
If you feel your model might include some of the above you could perhaps create an alternative version without them (where appropriate) and see if performance improves any further.
Last edited by DonkeyOte; 03-12-2010 at 07:03 AM.
My Recommended Reading:
Volatility
Sumproduct & Arrays
Pivot Intro
Email from XL - VBA & Outlook VBA
Function Dictionary & Function Translations
Dynamic Named Ranges
This one should be thread of the month.
... now there's an idea! ... off to the Water Cooler ...
teylyn
Microsoft MVP - Excel
At Excelforum, you can say "Thank you!" by clicking theicon below the post.
Avoid pie charts with more than two data points. Why? See here (pdf, 559 kb). The only acceptable pie chart is here.
Yes, I do think I caught all of these. And thanks for the pivot table suggestion, I could potentially use it in the results summary, but I don't think it would apply where the conditional formulas were originally doing the heavy lifting in the sheet.- Are all "repetitive calculations" (mentioned by DO) split into different columns, so they are only calculated once with the result cell then being referenced in subsequent formulae?
Yep, this was one part of the reconstruction that gave me pause. The DNRs use OFFSET(). The amount of data input into the sheet will vary wildly...say from 10k rows to 150k or more. So the choice was either to make the named ranges dynamic using OFFSET() or make them static with more rows than I ever think I'll need, say 500k. Actually now that I think about it, I may have made the wrong choice. Just about the only function that is referencing these named ranges is INDEX() so I don't think having extraneous rows included would hurt performance at all (unless it came from increased memory demands). So by going with static named ranges, the sheet would be free of anything volatile. The obvious downside to going static of course is that if i ever exceed 500k rows, I'll need to redo all the ranges (if i remember, that isDynamic Named Ranges do involve overheads in so far as the calculations used to generate them are recalculated each time they are used.
Note also that if they DNR's are created with OFFSET / INDIRECT etc they make all formulae utilising them Volatile by association....will need to put a big red alarm cell somewhere).
Anyway, I need to sleep on this potential change. Thank you all once again. If I don't make the change tomorrow and report back with the performance difference I'll be sure to have this marked as SOLVED.
Last edited by DonkeyOte; 03-13-2010 at 04:21 AM. Reason: corrected tag
It is possible to create DNR's without them being Volatile c/o INDEX.Originally Posted by JonnyB
Consider (for sake of example):
Data in A:C
Data type in A is Numeric
Last row of data in Col A dictates the height of the 3 Column Range being created.
Using OFFSET (all functions utilising the below DNR become Volatile by association):
=OFFSET($A$1,,,COUNTA($A:$A),3)
Using INDEX (semi-volatile - ie only on file open not thereafter)
=$A$1:INDEX($C:$C,MATCH(9.99E+307,$A:$A))
the latter will work even with non-contiguous data sets
(if the data type is not consistent there are alternatives to cater for that)
Whether or not a Volatile DNR will have a significant impact depends largely on the functions utilising it - if they are lightweight and efficient then the impact is lessened ... obviously if the opposite is true then avoiding the Volatility in the DNR becomes something of a no-brainer IMO.
To reiterate though - if the formula are relatively efficient then hard wiring a range may not be a bad idea - best way would be to test of course.
Let us know how you get on - I for one am quite interested in the performance impact (if any - and what it amounts to).
My Recommended Reading:
Volatility
Sumproduct & Arrays
Pivot Intro
Email from XL - VBA & Outlook VBA
Function Dictionary & Function Translations
Dynamic Named Ranges
hi all,
I'm interested too :-)
If you have a lot of columns that you'd like to create "index based" DNR's for you may find the below macro useful. Note, the constants will need to be changed to match your file layout & the number of rows in the DNR's on the sheet are all based the last row in one column (Colno). This makes the created DNR's less flexible than DonkeyOte's version, but on the plus side, it is much quicker to set up a lot of them for a specific sheet.
DO/Shg/others,
Feel free to make comments for improving my modified version of Roger's code or if there are other major weaknesses. I've been tempted to put it in its own thread but it works "as is" so I haven't bothered.
(one day I might throw in an inputbox to get user interaction... + change the IIF structure based on comments in another thread & check the contents of the specchararr)
Code:Option Explicit Sub CreateNames() ' written by Roger Govier, Technology4U 'RB: 14/10/2009, modified after being sourced from http://www.contextures.com/xlNames03.html Dim WB As Workbook, ws As Worksheet Dim lrow As Long, lcol As Long, i As Long Dim myName As String, Start As String '26/02/2010, RB: Define a sheetname prefix to prevent confusion with existing dynamic ranges on other sheets. Const wsAbbrev As String = "RT_" ' set the row number where headings are held as a constant ' change this to the row number required if not row 1 Const Rowno As Long = 5 ' set the Offset as the number of rows below Rowno, where the ' data begins Const ROffset As Long = 1 ' set the starting column for the data, in this case 1 ' change if the data does not start in column A Const Colno As Long = 1 ' Set an Offset from the starting column, for the column number that ' will always have data entered, and will therefore be used in calculating lrow Const COffset As Long = 0 ' in this case, the first column will always contain data. On Error GoTo CreateNames_Error Set WB = ActiveWorkbook Set ws = ActiveSheet ' count the number of columns used in the row designated to ' have the header names With ws lcol = .Cells(Rowno, .Columns.Count).End(xlToLeft).Column lrow = .Cells(.Rows.Count, Colno).End(xlUp).Row Start = .Cells(Rowno, Colno).Address End With With WB.Names .Add Name:=wsAbbrev & "lcol", RefersTo:="=COUNTA($" & Rowno & ":$" & Rowno & ")" .Add Name:=wsAbbrev & "lrow", RefersToR1C1:="=COUNTA(R" & Rowno & "C" & Colno + COffset & ":R" & Rows.Count & "C" & Colno + COffset & ")" .Add Name:=wsAbbrev & "DynSourceDataForPT", RefersTo:= _ "=" & Start & ":INDEX($" & Rowno & ":$" & Rows.Count & "," & wsAbbrev & "lrow," & wsAbbrev & "Lcol)" End With For i = Colno To lcol ' if a column header contains spaces, replace the space with an underscore ' spaces are not allowed in range names. 'RB: originally myName = Replace(Cells(Rowno, i).Value, " ", "_") 'modified to... myName = wsAbbrev & CleanStr(Cells(Rowno, i).Value) '26/02/2010, RB: the next IF statement should not bew needed due to the addition of "wsAbbrev & " If IsNumeric(Left(myName, 1)) Then myName = "_" & myName If myName = "" Then ' if column header is blank, warn the user and stop the macro at that point ' names will only be created for those cells with text in them. 'MsgBox "Missing Name in column " & i & vbCrLf _ & "Please Enter a value in the Header row and run macro again" 'Exit Sub Else WB.Names.Add Name:=myName, RefersToR1C1:="=R" & Rowno + ROffset & "C" & i & ":INDEX(R" & Rowno + ROffset & "C" & i & ":R" & Rows.Count & "C" & i & "," & wsAbbrev & "lrow - 1)" End If Next i On Error GoTo 0 MsgBox "All dynamic Named ranges have been created" Set ws = Nothing Set WB = Nothing Exit Sub CreateNames_Error: MsgBox "Error " & Err.Number & " (" & Err.Description & _ ") in procedure CreateNames of Module Technology4U" Stop 'to allow for investigation while the macro is still "running" Set ws = Nothing Set WB = Nothing End Sub Private Function CleanStr(ByVal Str As String) As String '14/10/2009 modified from "CleanFolderName" to meet the needs of Excel Names ie no spaces or &'s ' 14/11/2008, sourced from: _ http://www.dailydoseofexcel.com/archives/2006/05/24/creating-folders-with-mkdir/ 'I changed it from the posted version of "For i = 1 To Len(Str) _ Select Case Mid$(Str, i, 1) _ Case "/", "\", ":", "*", "?", "", "|" _ ..." to use "specchararr" Dim i As Long Dim sTemp As String Dim SpecCharArr As Variant SpecCharArr = Array("/", "\", ":", "*", "?", "", "|", " ", "&", "'", "(", ")", "/", "-") For i = LBound(SpecCharArr) To UBound(SpecCharArr) sTemp = Replace(IIf(i = LBound(SpecCharArr), Str, sTemp), SpecCharArr(i), "_") Next i CleanStr = sTemp End Function
hth
Rob
Last edited by broro183; 03-13-2010 at 09:20 AM.
Rob Brockett
Kiwi in the UK
Always learning & the best way to learn is to experience...
Hi Everyone,
My apologies for being unattentive to this thread, I came down with a virus and didn't do much for the better part of a week. I'm finally just getting caught up on my work now.
As far as the changes to the DNRs are concerned (removing OFFSET())...unfortunately, there was no material performance difference. However, I did restructure the sheet and add some VBA code which further cut the recalc time from 28 secs to 18 secs. Which is great, but I'm noticing that of those 18 seconds, 10 of them are coming from one column. So I think I can get it down even further (I'm finally making good on providing an optimization challenge).
The attached sheet demonstrates the problem (Column C). The signal IDs in column A represent a unique pair (i.e. there are two #1s, two #2s, etc.). On the second occurrence of a signal ID (reading top-to-bottom), Column C aims to return the value from Column B when the signal ID from the current row first occurred (that is rather confusing to read - it should be made clear in the example). As you'll see the calc time is approximately 10 seconds (actually only around 8 in this sample sheet). I could live with this, but the real problem is that filling down and deleting the formulas takes forever.
[I cannot sort the order of signal IDs and have everything running on manual calc]
Would greatly appreciate any ideas -- even if the calc time doesn't improve, something that will delete faster would help a lot.
-JonnyB
If you can't sort the data you will not really be in a position to improve performance with formulae I'm afraid - sorting the data allows for binary search which on large data sets is hugely beneficial (only a tiny subset of the data is ever reviewed).
You could perhaps adopt a VBA based solution - ie something along the lines of:
I'm sure the VBA guys can improve on the above - but on your data set the above would be pretty instantaneous.Code:Sub Example() Dim rngCopy As Range Application.ScreenUpdating = False With Range(Cells(6, "A"), Cells(Rows.Count, "A").End(xlUp)).Resize(, 2) .Copy .Offset(, 4) With .Offset(, 6).Resize(, 1) .FormulaR1C1 = "=ROW(RC)" .Value = .Value End With Set rngCopy = .Offset(, 4).Resize(, 3) With rngCopy .Sort Key1:=.Columns(1), Order1:=xlAscending, Key2:=.Columns(3), order2:=xlDescending End With With .Offset(, 2).Resize(, 1) .FormulaR1C1 = "=IF(VLOOKUP(RC1,C5:C7,3)=ROW(RC),"""",VLOOKUP(RC1,C5:C6,2))" .Value = .Value End With rngCopy.Clear Set rngCopy = Nothing End With Application.ScreenUpdating = True End Sub
In essence it copies the existing data set (Cols A:B) to a new temp table, sorts it, applies a formula (col C) which utilises binary search against the sorted table and then subsequently overrides the formulae with results and removes the temp table.
Last edited by DonkeyOte; 03-27-2010 at 05:01 AM. Reason: modified formula to account for 0 being first value
My Recommended Reading:
Volatility
Sumproduct & Arrays
Pivot Intro
Email from XL - VBA & Outlook VBA
Function Dictionary & Function Translations
Dynamic Named Ranges
Thanks for taking the time with your suggestion DonkeyOte. That would work nicely as you displayed...the only issue is that Column B is actually formula driven as well, so using VBA code wouldn't quite fit. I was able to come up with a non-VBA solution, and to be honest I'm not quite sure why it works so much better, but it deletes instantly. I know part of it is because the if statement reduces by half the number of times the formula is called, but they still doesn't explain all of it. In case its of interest to anyone, I've attached a sheet with the solution (had to delete a bunch of rows to get the file size down, but the gist of it is there). Thanks again.
I'm not really sure why the use of formulae in B precludes VBA - you would want to avoid using .Copy in the form presented but you could paste values only to the temp table either via pastespecial or by simply using range.value = range.value approach (avoiding clipboard).Originally Posted by JonnyB
As you say the pre-emptive IF will have a large impact - esp. with 20k rows.Originally Posted by JonnyB
OFFSET is regarded as one of the quicker functions (albeit Volatile) and INDEX one of slower functions (albeit non-Volatile) though I have to say using your sample file and the following approaches:
=IF(A6=1,"",OFFSET(C5,$D$2-ROW()+MATCH(B6,$B$5:B5,0),))
=IF(A6=1,"",VLOOKUP(B6,$B$5:$C5,2,0))
=IF(A6=1,"",INDEX($C$5:$C5,MATCH($B6,$B$5:$B5,0)))
all calculated in around 2 seconds over the 14k rows... the VLOOKUP was the slowest, INDEX second slowest and OFFSET was quickest (but being Volatile were the formulae to persist they would calculate more often than perhaps necessary).
It might be worth outlining what you're doing exactly - given you mention deleting etc ...
My Recommended Reading:
Volatility
Sumproduct & Arrays
Pivot Intro
Email from XL - VBA & Outlook VBA
Function Dictionary & Function Translations
Dynamic Named Ranges
DonkeyOte- The problem I face with a VBA solution and sorting the entire range is that there are formula dependencies in Column B on previous values of Column C. So I cannot assume that all Column B values are known and then just sort and lookup for the entirety of Column C. I'm not sure how to get around this...I would guess that a loop would take a while. I'm not as encouraged as I first was about the OFFSET() solution...I tested on 35k rows and it was considerably slower (20 seconds).
Last edited by JonnyB; 03-29-2010 at 01:04 AM.
I confess you've lost me now I'm afraid.Originally Posted by JonnyB
Your prior file(s) implied that PrevSize was derived from Size based on first instance of SignalID ... you're now implying that PrevSize is itself derived from Size - ie a circular reference of sorts.
(edit: are you perhaps saying that Size is derived from PriorSize in prior rows ?)
To be clear though - the suggestion I posted previously does not in any way affect the ordering of the original table.
The code:
i) copied the original table to a new temporary location
(given we now know formulae reside in original we would revise to a values based approach)
ii) sorted the temp table
The sorting of the data set thus permits use of Binary Search based Lookups to calculate PrevSize which will obviously have significance gain when compared to the 20k exact match tests.
Once PrevSize was calculated the code removed the formulae (references to the temp table) and the temp table itself was purged.
It would seem however that I'm missing something pretty fundamental in the process so I'll keep quiet until such time as the "penny drops" (if and when).
(edit: perhaps post an example which illustrates the relationship between Size and PrevSize - ie formulae in place re: Size)
Last edited by DonkeyOte; 03-29-2010 at 04:35 AM.
My Recommended Reading:
Volatility
Sumproduct & Arrays
Pivot Intro
Email from XL - VBA & Outlook VBA
Function Dictionary & Function Translations
Dynamic Named Ranges
Okay, I realize I have done a poor job demonstrating this. I hope this sheet will illustrate what I'm saying. Basically, one of the columns that you want to include in the table for binary search depends on previous values from the column that we are aiming to populate. The column titled "Exit Size" in this sheet is the one of interest.
What I have in here is the best solution I was able to come up with. The values and order of signal IDs do not change during the scenario runs, so I initially set up the sheet by having VBA calculate the offset numbers in Column C and then setting them equal to values so that deleting won't be an issue later. From there, the 'Exit Size' column uses those helper cells to do an offset. What do you think? The recalc time is now excellent ~1 second...but the set up of the sheet with VBA takes 2-3 mins...not great, but acceptable.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks