Weighting values in LINEST or LOGEST

**Technetium** · 04-13-2022, 06:51 PM

I have used LINEST and LOGEST a lot, but the one issue that has had me looking at alternative ways of solving problems is an inability to find a good way to weight the values. I have my data such that X is a date and Y is a number. I want recent values to be worth 100% weight, and for older values to scale down to 5% weight. Now I already have a good algorithm for generating a specific weight value in that range for each pair based on the X value. That's not the issue. So just to reiterate, for each X,Y pair, I already have a weight value associated with it. I'm just not sure how to incorporate these values when doing a linear regression.

I can't just make the Y value multiplied by the weight, because that will actually cause that Y value to impact the linear regression more rather than less. For example. Suppose I have Y values that are generally around 20. I have a single Y value that is much lower, at 8. This Y value is an old value. If I were to multiply the 8 * 0.05, I'd have 0.4, which would then impact the curve even more than 8 would.

The closest thing I can think of for a solution here is, what I really want to do is have the weight value impact how far the point deviates from... well, I'm not really sure. From the average? From the curve without it? From the most recent value? Going by the last one, I came up with this:

Take the most recent Y value, n.
For any older pair of data with Y value m and weight value w:
1. calculate the difference: d = (n - m)
2. multiply the difference by the weight: e = d * w
3. Subtract the weighted difference from the recent Y value: k = n - e
use the k set of values instead of the Y set of values for the linear regression.

Of course, the problem is a new value that is wildly different than previous values: This would suddenly shift all the old values considerably using this approach and a single new data point that is different from a previous one will cause the regression to produce something very different than it did before. I don't want the addition of a single point to a large set to have profound changes on the regression line.

I wonder if the right approach is more like... what is the linear regression without this point, and then what is the linear regression with this point, and measure the difference, and then apply the weight value to that difference and filter the value that way into the actual regression. UGH.

If anyone has thoughts on this, let me know.

**MarvinP** · 04-13-2022, 07:40 PM

Hi Technetium,

If you have 10 dates and 10 y values and the most recent is more important, why not make 10 of them and 9 of the date before. By repeating the point on the graph would be "weighting" it.

Start with your first past date and make only one of them. The next date gets two of them, etc until you get to the most recent. Then see what your linest and logest do.

I think this is just another way of looking at the problem. Hope this helps.

**MrShorty** · 04-13-2022, 10:14 PM

I'm going to second MarvinP's suggestion. When I do weighted regressions, I use integer weight factors, then create new known_x and known_y arrays where there "weight" copies of each data point. Part of doing it that way is that I don't want to delve into the statistics behind weighted regression to figure out how it would be done differently. In my case, I use a VBA UDF that loops through the data, making "weight" copies of each data point into a new array, then passing that array to the LINEST() (or LOGEST() ) function. It's not the only way to do it, but it works well enough for me.

Assuming you are content with this kind of proposal, the first step would be to decide on your programming language -- VBA (like mine), worksheet functions, or Power Query/Get and Transform (I have no familiarity with PQ M language, but I expect it should have no trouble with taking an array of weights and making copies of each data point as specified in the weight array and write new arrays to Excel, where LINEST()/LOGEST() could easily handle the regression).

How would you like to proceed?

**Technetium** · 04-13-2022, 10:58 PM

I'm currently working with Excel formulas at the moment. I've used VBA in the past; part of my incentive here is to try and whittle out the use of VBA since I know that some of my use of it in the past has been a bandaid for not knowing how to do it using just regular Excel formulas.

I'm not sure how to create duplicate points. LINEST and LOGEST takes range for the X & Y values. Suppose that it has an X,Y pair and a weight value telling it that it needs 37 copies of that point. If that point is data derived from A45,B45, and my LINEST uses the range A2:A500 and B2:B500, how do I get it to know to make 37 copies of that point?

I should mention that this is a fluid application - the values all change when updated, including the weight values, so I can't just create 37 rows of the same point, since I'd constantly be coming back and removing rows and adding rows whenever the data changes.

Also, I want to mention that I tried something that seemed interesting but ended up not being a good solution. Working with the ideas I had in the previous post, I calculated the average Y value, and then the difference of the average and each point's Y value, then multiplied each difference by the matching weight (a value between 0.05 and 1.0), and then created a new set of Y values based on that. The result of this was that old points were very close to the average and new points were far from the average. But the downside of this is that the distribution made for extremely poor R^2 values (less than 0.1).

**MrShorty** · 04-14-2022, 10:04 AM

I'm not sure how to create duplicate points. LINEST and LOGEST takes range for the X & Y values. Suppose that it has an X,Y pair and a weight value telling it that it needs 37 copies of that point. If that point is data derived from A45,B45, and my LINEST uses the range A2:A500 and B2:B500, how do I get it to know to make 37 copies of that point?

I should mention that this is a fluid application - the values all change when updated, including the weight values, so I can't just create 37 rows of the same point, since I'd constantly be coming back and removing rows and adding rows whenever the data changes.

I don't usually find "fluid" applications very difficult. Most Excel functions (LINEST() and LOGEST() being notable exceptions) usually have no trouble ignoring blank rows, so I simply give the function an input range larger than I ever expect to use. I also have found that it may be important to understand how Excel modifies (or doesn't) those references when using insert row/cells, delete row/cells, clear row/cells, copy/paste, cut/paste, etc -- in case you tend to use those commands to manipulate your input data so that you know what to expect when you use those commands.

That said, here's how I did it (in Gnumeric, so this should work in any spreadsheet. Where you have Excel 365, there are newer, Excel specific functions that may make this cleaner if you know how to use them. I will let someone else familiar with those functions make any suggestions they want). The basic idea is to figure out index numbers, then use a lookup function to "copy" the values into a new range.

1) Start by entering "x" in A1, "y" in B1, "wt" in C1, "index" in D1 just to label columns.
2) In A2:C2, I enter random numbers for each then copied the random number formulas down. I entered about 15 data points, as I have no idea how many data points you typically have.
3) In D, I will sum the weights that will tell the spreadsheet which index numbers go with each point. In D2 I enter 0 (assuming 0 is a valid wt meaning do not use this point). In D3, I enter =SUM(C2:C$2,1). Note the mix of relative and absolute referencing here so that Excel gives me a running total (+1) of the wts in column C. These become index points where the lookup function changes the return value.
4) To set up the new regression input range, I start by entering column labels H1:J1 -- H1 is "index", I1 is "x", J1 is "y".
5) In H2, I enter 1 (since I don't want to return a 0 weight). In H3, I enter something like =H2+IF(H2>max($D$2:$D$200),NA(),H2+1). D2:D200 means I expect no more than about 200 x,y,wt triplets to be entered. This will give numbers when appropriate and N/A errors when we get beyond the input data.
4) A simple lookup can now return the input values. I used =INDEX(A$2:A$200,MATCH($H2,$D$2:$D$200,1)) in I2 and copied into J2 and down (note again the mix of relative and absolute references that make this copy/paste/fill possible).
5) The trickiest part is that LINEST() and LOGEST() do not tolerate blanks or non-numerics in their input ranges. We now need a dynamic range to fill the arguments for the regression function. A relative simple OFFSET() + COUNT() function is a common approach to this. Something like OFFSET($H$2,0,1,COUNT($H$2:$H$4000),1) might represent the x value input range (a similar function for the y input range).
5b) Nest those OFFSET() functions inside of your regression function =LINEST(OFFSET(known y version),OFFSET(known x version)).

Something like that should work. My only question at this point (and a counter to my claim of universality earlier) is that I'm not sure if Excel's COUNT() function will handle the errors the same as Gnumeric. Gnumeric ignores the errors in column H and returns the correct count. Excel may propagate rather than ignore the errors. If that is the case, I expect that rewriting column H to return a text string when beyond the max index will work better.

Does that help? Do you get stuck implementing any of those steps?

Weighting values in LINEST or LOGEST

LinkBack

Thread Tools

Rate This Thread

Display

Weighting values in LINEST or LOGEST

Re: Weighting values in LINEST or LOGEST

Re: Weighting values in LINEST or LOGEST

Re: Weighting values in LINEST or LOGEST

Re: Weighting values in LINEST or LOGEST

Thread Information

Users Browsing this Thread

Similar Threads

LOGEST returns #NUM! error, same data range works fine in LINEST

[SOLVED] LOGEST producing values that will produce 0 for any value of X.

Reconciling LOGEST with only known Ys with LOGEST with known Ys and Xs as an array

weighting calculation with the ability to remove and add specific stocks in weighting

How to solve to maximize a value using weighting values

Logarithmic curve formulas in charts and LOGEST fx values?

[SOLVED] LINEST, LOGEST, GROWTH or TREND??

Bookmarks

Bookmarks

Posting Permissions