Dear exalted guru(s):
I'm trying to join a single X- and multiple Y-variables into a dataset for histogram, scatterplot, and OLS regression analysis. I've been spinning at this one for days. Excel file is attached.
X and Ys are defined as follows:
X: "xy_rating" column as shown in tab "problem_1" per date of rating action
Ys: store count changes, both annual and cumulative, under date columns in the nine yellow-colored tabs per year y-action occurred, since there are multiple Ys, let's call it "y-actions" for now.
key: names in column "xy_tic"
Dates are the non-key, common element: first, date of rating with respect to X and second, year y-action occurred with respect to Y. Because I want to test relationship of "xy_rating" on each subsequent year's Y store count, only until the next X for each xy_tic, the final data set will have hundreds of XY combinations. I planned to do this in three steps:
STEP ONE: Lookups with INDEX/MATCH/INDIRECT
First, apply a lookup to populate cells P4:ET124 in "example_1" - the first section has my bad formula. My formula seems to fail evaluating dates, among possibly other errors. I haven't even gotten to the INDIRECT part - which confuses the heck out of me.
For each xy_tic, the lookup should return the "y-action" that occurred during the period the rating was in effect. This period is defined as the years following the "y-action" but[INDENTbefore the next rating or when the rating ends.
STEP TWO: join observations into XY dataset observations
Complete the join by putting together the lookup Ys with xy_rating data. This one is harder to explain, so I'll do it with pictures.
Screen Shot 2019-09-07 at 6.30.59 AM.png
The data in the first picture results in six different observations, as shown in the second picture. I have to do it for ALL NINE Ys, with the table in columns A:J in tab "example_2" with subsequent Ys added on.
Screen Shot 2019-09-07 at 6.32.14 AM.png
How can I arrange the data in the first picture into the form of the second picture given how I set it up in "example_1"? Is there a better way to set it up? Can I do this with Power BI or some relational database function within Excel? Or is this only a job for VBA?
Unless I can automate that, it seems like a very long, error-prone slog.
STEP THREE: histograms, scatterplot, and OLS regression
Run pivot table on data per "example_2", then manually a routine of set up bins, histograms, descriptive stats, scatterplots, and OLS regression. That's a lot of steps for each subset.
If I isolate each XY subset, can I automate the routine?
FINAL QUESTIONS:
While my data set isn't huge, the way I'm going about the task is a bit unruly; I don't think I could manage more. Is there a better way to organize it all within Excel?
Thank you so very much for even reading this! I'm grateful for any suggestions, even if only for one of the parts!
Bookmarks