I have a data of 12 years from 2008-19. I want to compare 2019 with all the years and find which is the most 'similar' year to it statistically.
I have a data of 12 years from 2008-19. I want to compare 2019 with all the years and find which is the most 'similar' year to it statistically.
Last edited by Sarangsood; 01-17-2020 at 01:12 AM.
Anyone....?
i am not sure, what is your criteria. As per my assuming.
In "M12"Formula:Please Login or Register to view this content.
Ensure simultaneously pressing shift+ctrl+enter
N12 is criteria value. [In above formula you can change MIN to MAX also.
See the attachment file.
atul
If my answer (or that of other members) has helped you, please say "Thanks" by clicking the Add Reputation button at the foot of one of their posts.
Also: if your problem is solved, please take the time to mark your thread as SOLVED by going to the top of your first post, selecting "Thread Tools" and then "Mark thread as solved".
i want to compare 2019 to other years and find which year resembles it the most. for example if we draw a line graph of 2019, which year would it most comparable to. So in N4 it should compare 2019 jan-feb to other years jan-feb and give the result of the closest year (in this case it would be 2013). I have explained the same in the workbook below. Thanks
As per my understanding, you want to compare datasets to see if they are different statistically.
T-test can come handy...
Calculate Mean, STDEV and then perform T-Test.
You will see that data from 2016 and 2018 are significantly similar to 2019 upto September (T-Value >5%).
You know the data better than any of us here so, it's your call whether to pull data from 2016 or 2018.
Use INDEX/MATCH to pull the data based on T-Value.
Apologies if I didn't get your problem right.
Shashank Mishra
Please hit "Add Reputation" Button if you liked the answer.
I am not sure of criteria either. I am also not too 'swift' when it come to statistics. I took some guesses and used standard deviations and from there FORECAST.
Two approaches. You may have to array enter the final formulas. If you aren’t familiar with array-entered formulas array enter means the formula must be committed from edit mode by simultaneously pressing and holding down Ctrl and Shift while hitting Enter.
Approach 1:
Uses helper row 6 with STDEV.P for each column's data.andFormula:Please Login or Register to view this content.Then finds the absolute differences of the first 11 from last, matches the MINIMUM of those to the differences and INDEXes the headers based upon that match. In N6.Formula:Please Login or Register to view this content.Formula:Please Login or Register to view this content.
Approach 2 does the same as 1 but is contained in a one off formula. In N7Formula:Please Login or Register to view this content.
Then in either event the forecast numbers in M17:M19. Entered in M17 and filled down.Formula:Please Login or Register to view this content.
Dave
How do you define most similar? Highest correlation? Minimum average absolute difference? Minimum squared difference? Similar curve fits?
Similar would mean that if i draw a line graph of all the years, i want to find the most similar looking. I'm not able to categorize it into exactly which statistical formula will it fit into.
Last edited by AliGW; 01-17-2020 at 02:27 AM. Reason: Please don't quote unnecessarily!
Administrative Note:
Please don't quote whole posts, especially when you are responding to the one immediately preceding your own - it's just clutter. It's OK to quote if you are responding to a post out of sequence, but limit quoted content to a few relevant lines that makes clear to whom and what you are responding. Thanks!
For normal conversational replies, try using the QUICK REPLY box below or the REPLY button instead of REPLY WITH QUOTE.
Ali
Enthusiastic self-taught user of MS Excel who's always learning!
Don't forget to say "thank you" in your thread to anyone who has offered you help. It's a universal courtesy.
You can reward them by clicking on * Add Reputation below their user name on the left, if you wish.
NB: as a Moderator, I never accept friendship requests.
Forum Rules (updated August 2023): please read them here.
Okay, will keep that in mind. Thanks
In trying to find the closest fit to a given line I believe that R2 is what you want.i want to find which is the most similar year to 2019
Similar to what Dave set up in post #6 cells B6:M6 are populated using: =RSQ(B8:B16,$M8:$M16)
Cell N6 displays the year matching the highest R2 value using: =INDEX(B7:L7,MATCH(MAX(B6:L6),B6:L6,0))
The first sentence of the article linked below may help explain why I feel that R2 is the best measure for your purposes.
https://www.techwalla.com/articles/h...ulate-r2-excel
Let us know if you have any questions.
Consider taking the time to add to the reputation of everybody that has taken the time to respond to your query.
R2 is akin to DEVSQ/SSE (sum of squared errors). A straight line can be a closer match for a parabola near the parabola's extreme point, but other parabolas would be much closer to the same shape. Even just considering 2 straight lines, one line with negative slope could be closer to another line with positive slope within some range than another line with positive slope. OP has mentioned closeness of graphs, which strikes me as caring at least as much about shape as local proximity.
ADDED: R2 could be useful, but in a different sense. It could be used to decide whether linear, exponential, polynomial, exponentiated polynomial or mixed curve fits best represented each column, then select closest from among the curves of the same general shape.
Last edited by hrlngrv; 01-18-2020 at 09:56 PM.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks