I have used LINEST and LOGEST a lot, but the one issue that has had me looking at alternative ways of solving problems is an inability to find a good way to weight the values. I have my data such that X is a date and Y is a number. I want recent values to be worth 100% weight, and for older values to scale down to 5% weight. Now I already have a good algorithm for generating a specific weight value in that range for each pair based on the X value. That's not the issue. So just to reiterate, for each X,Y pair, I already have a weight value associated with it. I'm just not sure how to incorporate these values when doing a linear regression.
I can't just make the Y value multiplied by the weight, because that will actually cause that Y value to impact the linear regression more rather than less. For example. Suppose I have Y values that are generally around 20. I have a single Y value that is much lower, at 8. This Y value is an old value. If I were to multiply the 8 * 0.05, I'd have 0.4, which would then impact the curve even more than 8 would.
The closest thing I can think of for a solution here is, what I really want to do is have the weight value impact how far the point deviates from... well, I'm not really sure. From the average? From the curve without it? From the most recent value? Going by the last one, I came up with this:
Take the most recent Y value, n.
For any older pair of data with Y value m and weight value w:
1. calculate the difference: d = (n - m)
2. multiply the difference by the weight: e = d * w
3. Subtract the weighted difference from the recent Y value: k = n - e
use the k set of values instead of the Y set of values for the linear regression.
Of course, the problem is a new value that is wildly different than previous values: This would suddenly shift all the old values considerably using this approach and a single new data point that is different from a previous one will cause the regression to produce something very different than it did before. I don't want the addition of a single point to a large set to have profound changes on the regression line.
I wonder if the right approach is more like... what is the linear regression without this point, and then what is the linear regression with this point, and measure the difference, and then apply the weight value to that difference and filter the value that way into the actual regression. UGH.
If anyone has thoughts on this, let me know.
Bookmarks