+ Reply to Thread
Results 1 to 10 of 10

Macro to replace very slow array calculation

  1. #1
    Registered User
    Join Date
    01-26-2018
    Location
    Melbourne
    MS-Off Ver
    Excel 2016
    Posts
    13

    Macro to replace very slow array calculation

    Hi,

    I would like to know whether my goal can be achieved with a macro instead of formula.
    I did what I wanted with an array index match formula, however it literally took a week to calculate.

    The spreadsheet is about 30mb currently
    I reduced the data to only one month (1990) however it is still about 13mb.
    I had to upload it to my outlook public folder but cannot post it here.

    Sheet: Hours
    • 12 different sources of data exist (columns C-N) per hour, per date between Jan 1990 and Dec 2016.
    • Time/date rows where no data exists are missing and it is not a complete set of time/date rows (234,229 rows of data).
    • Blanks and zeros exist in the data.


    Sheet: Full-hours_1990-2016
    • This is complete list of times/dates (236,688 rows of data)
    • It is empty of data at the moment. This is where I want to place the data form the first sheet.

    My goal is place the data from sheet 'Hours' to sheet 'Full-hours_1990-2016 into the relevant time/date rows.
    Blanks need to show as blanks.
    Zeros need to show as zeros.

    Can this be achieved with a macro and will it be just as slow in calculating?

    Thanks

  2. #2
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,917

    Re: Macro to replace very slow array calculation

    To best describe or illustrate your problem you would be better off attaching a dummy workbook. The workbook should contain the same structure and some dummy data of the same type as the type you have in your real workbook - so, if a cell contains numbers & letters in this format abc-123 then that should be reflected in the dummy workbook.

    To attach a file to your post,
    click advanced (next to quick post),
    scroll down until you see "manage file",
    click that and select "add files" (top right corner).
    click "select files" find your file, click "open" click "upload" click 'done" bottom right. click "submit reply"

    Once the upload is completed the file name will appear below the input boxes in this window.
    You can then close the window to return to the new post screen.
    1. Use code tags for VBA. [code] Your Code [/code] (or use the # button)
    2. If your question is resolved, mark it SOLVED using the thread tools
    3. Click on the star if you think someone helped you

    Regards
    Ford

  3. #3
    Registered User
    Join Date
    01-26-2018
    Location
    Melbourne
    MS-Off Ver
    Excel 2016
    Posts
    13

    Re: Macro to replace very slow array calculation

    Hi FDibbins,

    Unfortunately the file is very large and exceeds the limit allowable here.

    I reduced it well below the 1 mb by limiting the data to only hours in January 1990.

    In this reduced example the total number of rows between the two sheets matches.
    However, in the full data set they do not match:
    The sheet (Hours) has missing time/date rows where there is no data and it is thus an incomplete set of time/date rows (234,229 rows of data).
    The sheet 'Full-hours_1990-2016' is what should be a complete list of times/dates (236,688 rows of data).

    Hopefully this clarifies my earlier post.

    Thanks.
    Attached Files Attached Files

  4. #4
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,803

    Re: Macro to replace very slow array calculation

    I would have liked to see your single cell array mega formula to better understand what it is doing.

    Are you required to use VBA for this? VBA can probably be faster than whatever array formula you are using, but VBA has a certain overhead to it that will make a spreadsheet based only on worksheet formulas faster than anything VBA can do. There are exceptions to every rule, but the general rule when talking about computation efficiency is to avoid VBA.

    Without seeing your formula(s), it is difficult to suggest improvements. However, there are some common inefficiencies that I see in these questions:
    A) Inefficient linear lookups: The heart of this task would be the lookup function, and almost all examples around the internet are for "exact match" linear lookups (3rd argument of MATCH() = 0) which are horribly slow. A binary lookup (3rd argument of MATCH() = 1 or -1 and data are sorted) is infiinitely more efficient. For multiple lookups on a large data set like this, I would do anything I could to use a binary lookup.
    B) Volatile functions (OFFSET() and INDIRECT() in particular): In isolation, these functions are fast, but thousands of copies of volatile functions override Excel's smart recalculation engine, forcing every copy of every function to recalculate each time. Again, I would do anything I could to eliminate or avoid these.
    C) Duplicated effort: One common problem with large array formulas is that the function itself repeats tasks, and different copies of the function will also repeat those tasks. In your case, I would expect that each copy of the function needs to perform the same lookup multiple times (to distinguish between blank and 0 and to return the result). You also have 12 values in each row that are performing the exact same lookups. Without thinking about it too hard, I could see your spreadsheet performing the exact same slow lookup 30 to 50 times in each row. Reduce or eliminate duplicated effort by using helper columns (I recall one user here who posted a spreadsheet that was "nearly unusable" which went to "nearly instantaneous" by simply adding a helper column or two).

    Without seeing your existing formula, here are some things I would probably do along the way to getting this done:

    1) Your source data has date and time separate. I would probably combine these together into a helper column where each row is represented by a unique date + time serial number. If needed, make sure the source data is sorted on this column. Your lookup function will search this column.
    2) I expect there is some duplicated effort in the attempt to distinguish between blank and 0 -- move this task to helper columns in the source data. Something simple like =IF(ISBLANK(C3),blank placeholder,C3). I assume you will use null string "" as your blank placeholder. If there are going to be charts build on this data, you may review this to see if null string is your best choice for blank placeholder: https://peltiertech.com/mind-the-gap...g-empty-cells/
    2a) If this is a one time or rare task, I might take and copy - paste special as values the formulas from step 2 over the original values.
    3) Use a single MATCH() function in a helper column in the destination table to perform the lookup. =MATCH(A3,helper_column_from_step_1,1)
    4) A simple INDEX() function to pull the result from the source table =INDEX(source_table,reference_to_MATCH()_function,column#). I will often have a row at the top or bottom of my destination table with the column#'s, so that the column numbers need not be hardcoded into the function.

    I don't know if I have understood everything this needs to do, but those are the kinds of things I would be looking at and trying to do. I would expect that a well designed spreadsheet will calculate in a tolerable amount of time.

    Edit to add: My recommendation are based on a spreadsheet formula approach. With no experience, I can't say for sure, but I wonder if a tool like Get and Transform could do this even more efficiently.
    Quote Originally Posted by shg
    Mathematics is the native language of the natural world. Just trying to become literate.

  5. #5
    Registered User
    Join Date
    01-26-2018
    Location
    Melbourne
    MS-Off Ver
    Excel 2016
    Posts
    13

    Re: Macro to replace very slow array calculation

    Thank you for your detailed advice MrShorty,

    I read through and considered each of your points.

    VBA crossed my mind as I was wondering if there were any alternatives available.

    (A) Regarding the types of matches, binary is a good point. Given my data are sorted in ascending order with date/time I suppose that should give the same result, however it appears to result in #N/A after the first row if a 1 or -1 is adopted.
    (B) I'm not aware of the OFFSET and INDIRECT functions- but it sounds like they're not worth exploring.
    (C) It seems very likely that I have included duplicated effort. The formula I am using requires the date and time columns to be separate. If I were to combined them a use a serial number, I presume this would reduce the duplicated effort?

    This is should be a one off task but I'd like to repeat the same for other analytes in other spreadsheets. I intend to insert the data once processed into my analyses and plots.
    By adopting a helper column to specify that a cell is blank, would that also mean a helper column for each data column? That seems to be to increase the amount of columns.

    As you have suggested, I attempted to simplify the INDEX and MATCH functions and use the combined date/time helper column, however it does not progress further than the first row. #N/A are returned thereafter.

    I have attached the spreadsheet which includes an example of the function I'm using. In the full data set, leaving it overnight only calculates about 5-10%.

    Do you have any specific advice regarding the function I'm currently using?

    Thanks
    Attached Files Attached Files

  6. #6
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,803

    Re: Macro to replace very slow array calculation

    Since lookups are frequently at the heart of slow calculations, focus on the MATCH() part of the function for now MATCH(1,(Hours!R3C2:R1515C2=RC2)*(TEXT(Hours!R3C1:R1515C1,"yyymmdd")=TEXT(RC1,"yyymmdd")),0) Observe that this function is making 1500 comparisons to check if the hours matches up, 1500 more comparisons to see if the day matches, and then 1500 "and" comparison to determine where both day and hours match up to build a lookup array of TRUE/1 and FALSE/0 values, which are not sorted (nor can they really be). Then, the MATCH() function tests each value until it finds the first 1/TRUE in the resulting array, yielding up to another 1500 operations. And this analysis does not include the overhead that Excel imposes on the operation (operations needed to maintain calculation dependencies, other functions that need to calculate, and so on). 1 copy of this function needs 4500 to 6000 operations just for the lookup part. Scale up to your 250000 rows, and each copy of this function needs 750000 to 1000000 operations. Then you make a couple million copies of that function, and it should be no surprise that it takes a week to calculate.

    At this point, I strongly recommend rethinking the lookup part. I may be missing something, but what is the effective difference between your MATCH() function and MATCH(A3,Hours!$C$3:$C$1515,1)? It looks to me like they should give the same exact result (my tests suggest that there is some floating point errors present that will need to be handled -- strategically placed ROUND() functions should be adequate), and should do so infinitely faster.

    By adopting a helper column to specify that a cell is blank, would that also mean a helper column for each data column? That seems to be to increase the amount of columns.
    Yes, it will increase the number of columns. The main point to understand is that any performance penalties incurred by having a few extra columns will be insignificant compared to the potential performance savings of eliminating duplicated effort.

    After testing, here's what I did:
    1) To deal with the floating point error, in Hours, I added a column =ROUND(C3,6) to get a suitable lookup column.
    2) In Full Hours, I added a column for the MATCH() function =MATCH(ROUND(A3,6),'Full-hours_1990-2016'!$P$3:$P$1515,1). Note the ROUND() function to handle the expected floating point errors.
    3) In the main output table, a simple INDEX() function referring the the MATCH() function in step 2. =IF(ISNUMBER(INDEX(D$3:D$1515,$P3)),INDEX(D$3:D$1515,$P3),"")

    That should be much faster than your current formula.

  7. #7
    Forum Contributor
    Join Date
    10-08-2018
    Location
    Lima, Peru
    MS-Off Ver
    Office 365 ProPlus
    Posts
    148

    Re: Macro to replace very slow array calculation

    While MATCH(searched value, lookup range, 1) can be used here as the values are sorted, and it performs extremely fast; care has to be taken if the searched value does no exist in the lookup range, as it will return a wrong answer.

    Given that one of the conditions is:

    Sheet: Hours
    Time/date rows where no data exists are missing and it is not a complete set of time/date rows.
    It means that there are missing times in the lookup range.

    The example file contains all hours in Sheet Hours, it we delete some hours to mimic the real worksheet, MATCH will return a wrong value if the hour searched from Full-hours_1990-2016 doesn't exist, it will return the highest existing time that is less than the searched hour.

    To prevent this, the formula in helper column in sheet Full-hours_1990-2016 should be:


    Formula: copy to clipboard
    Please Login or Register  to view this content.


    This will compare to another helper in column Q in sheet Hours, which simply contains another rounding of the times in column C. (only 3 decimal places are needed, as times are given full hours, an hour is 0.041666~, rounded to 0.042).

    Sheet Hours:
    Formula: copy to clipboard
    Please Login or Register  to view this content.


    The formula in Full-hours_1990-2016 will return the row number to be used by INDEX, when the hour indeed exists in sheet Hour, and FALSE otherwise.

    Then, the formula to populate the columns C through L in Full-hours_1990-2016:

    Formula: copy to clipboard
    Please Login or Register  to view this content.



    This leaves out any error or texts from sheet Hours.
    Last edited by LeoSkywalker; 11-08-2018 at 02:21 AM.
    Leo Skywalker
    May the force be with you.

  8. #8
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,917

    Re: Macro to replace very slow array calculation

    Perhaps consider using 1 or more helper columns to do some of the leg work for you?

  9. #9
    Registered User
    Join Date
    01-26-2018
    Location
    Melbourne
    MS-Off Ver
    Excel 2016
    Posts
    13

    Re: Macro to replace very slow array calculation

    MrSHorty- Thanks for pointing out each logical step in sequence from the function I was using.
    I'm amazed at the sheer volume of calculations I was unwittingly asking Excel to undertake.
    Your suggestions made sense to me and I appreciate the advice. The helper columns are indeed helpful!

    LeoSkywalker- Thank you for providing additional advice.
    The rounding to 3 decimal places and additional match functions make kinda sense to me.

    I adopted and tested the suggestions and it appears to address the key issue of having missing date/time rows in the Hours sheet.
    It only took 5 seconds to calculate the 236,690 rows of data across the 12 columns.

    Thank you very much for the advice, I really appreciate it.
    Now I can do some science.

  10. #10
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,917

    Re: Macro to replace very slow array calculation

    Happy to help and thanks for the feedback

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replace slow formula with macro or UDF
    By kishore013 in forum Excel Programming / VBA / Macros
    Replies: 5
    Last Post: 08-30-2017, 09:22 AM
  2. [SOLVED] Macro to create array and then perform calculation in column C
    By jaredmccullough in forum Excel Programming / VBA / Macros
    Replies: 27
    Last Post: 06-29-2017, 04:52 PM
  3. Macro which replace large array formula
    By ALEZI in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 04-22-2014, 02:32 PM
  4. Macro is running real slow and makes navigating the worksheet really slow after execution.
    By MichWolverines in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 10-03-2013, 04:29 PM
  5. [SOLVED] slow computer (slow clipboard) breaks my macro
    By twilsonco in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 06-05-2013, 09:16 PM
  6. [SOLVED] Slow replace macro
    By jomili in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 07-23-2012, 01:02 PM
  7. Replace Macro is really slow
    By SandiM in forum Excel Programming / VBA / Macros
    Replies: 6
    Last Post: 02-25-2011, 05:13 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1