+ Reply to Thread
Results 1 to 5 of 5

Need Help with "Regression" Analysis

  1. #1
    Forum Contributor
    Join Date
    08-06-2018
    Location
    Virginia
    MS-Off Ver
    2019
    Posts
    333

    Need Help with "Regression" Analysis

    Experts:

    I need some assistance with *REGRESSION* analysis in Excel. While I have some fundamental understanding of how (linear) regression works, I do need some clarification on both a) interpretation and b) potentially restructuring the sample data. Allow me to provide some background on the XLS (attached) and process first.

    Sample File:
    - Sample file includes purely made-up/random data; however, it mimics my organization's actual concept.
    - Attached XLS includes 3 tabs:
    1. "SampleData" -- contains 2 columns (Crime and State)
    2. "SampleData_Recoded" -- given the error (i.e., "Input range contains non-numeric data"), I re-coded the string values into numeric values (see tab "Lookup")
    3. "Lookup" -- summarizes the re-coding from string to numeric value

    Regression Process:
    - On tab "SampleData_Recoded", the SUMMARY OUTPUT is included (starting in column P)
    - Based on the regression analysis, the "Coefficients" column gives me both the intercept (3.029850746) and State value (0.179104478)

    Regression Model (based on the Coefficients information):
    - Cell P23 includes a drop-down list (1, 2, 3 indicating CA, FL, and TX, respectively)
    - Cell Q23 uses the formula which **predicts** the dependent variable (i.e., Crime).
    - Given cell Q23 includes decimals, I am using cell R24 to round to 0 decimals.
    - Lastly, cell S23 uses the INDEX/MATCH to cross-reference the numeric value to the crime (string).

    My question(s):
    1. Based on this (small) **sample** data set, does it mean that the most prevalent crime = "Homicide" if CA (1) OR FL (2) is selected in cell P23?
    2. Alternatively, does it also mean that "Kidnapping" is the most frequently occurring crime if TX (3) is selected in cell P23?
    3. If my interpretation of Q1 and Q2 are incorrect, what does the regression (model) tell me in this example?
    4. Finally, is there another way to restructure the data set and perform the regression differently so that I get some form of matrix (for each state) and their respective crimes?

    Thank you for your help in advance,
    EEH
    Attached Images Attached Images
    Attached Files Attached Files
    Last edited by skydivetom; 09-29-2020 at 01:08 PM.

  2. #2
    Forum Expert CK76's Avatar
    Join Date
    06-16-2015
    Location
    ONT, Canada
    MS-Off Ver
    Office 365 ProPlus
    Posts
    5,855

    Re: Need Help with "Regression" Analysis

    Your p value is too small and would be considered as strong evidence against null hypothesis (i.e. claim made about the population).

    In this case I'd interpret it as... what state crime occurs in is not valid predictor of what type of crime is committed.
    ?Progress isn't made by early risers. It's made by lazy men trying to find easier ways to do something.?
    ― Robert A. Heinlein

  3. #3
    Forum Contributor
    Join Date
    08-06-2018
    Location
    Virginia
    MS-Off Ver
    2019
    Posts
    333

    Re: Need Help with "Regression" Analysis

    CK76 - yes, in principle, I agree that the p-value is NOT statistically significant. Keep in mind that this -- as mentioned -- is random sample data.

    At this time, I merely want to explore the concept and determine if my **interpretation** of the regression output is stately correctly. Also, per Q4, do you know of another way to run the data differently to obtain a broader perspective? That is, I do NOT want to use any form of COUNTIFs and histograms to outline "how many crimes/state"? Instead, is there another statistical method that I can use to somehow **predict** the # of crimes per state (always keep in mind this is random/sample data only).

    Thanks!
    EEH

  4. #4
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,802

    Re: Need Help with "Regression" Analysis

    These questions feel more statistical than Excel.

    When I want to think through a regression analysis, I like to include a scatter chart with the resulting trendline that allows me to visualize what is going on. If you add a scatter chart, switch the x and y ranges (so column A is Y and column B is X -- Excel will default to A is X and B is Y), then add a trendline to the chart, you can see what you linear regression is doing. A scatter chart should show you that your recoding of the data is simply creating a "grid" in the chart, and then the trendline simply represents "the middle" of the grid.

    Questions 1 and 2) "Prevalence" (or other expression of frequency) is at best only weakly represented in the raw data, so the regression really cannot say anything about prevalence.
    Question 3) While it is true that Excel cannot perform a regression on text, simply assigning arbitrary numbers that have no meaning does not assure that your regression will have meaning. I see no meaning to be derived from the regression.
    Question 4) I'm sure there are many ways to restructure the data set to get something meaningful out. The first step is to replace the text data with meaningful (not arbitrary) numbers. Q1 and Q2 suggest that you are trying to understand something about "prevalence" -- a count of how frequently each crime occurs as a function of state. If that is the interest, then I would expect part of the restructure to include a counting step (whether by pivot table or COUNTIFS() functions or however you like to count data in a spreadsheet). "State" is difficult to use as the independent variable, so you would want to substitute some meaningful number that will represent something quantifiable about each state. Maybe "year of admission to the union" to explore how and if older states impacts crime rates. Or maybe population or average elevation or land area or average price of milk or price of gasoline or ... (which leads me to reflect on Tyler Vigen's spurious correlations: http://tylervigen.com/page?page=1 ).

    In short -- in order for regression to have meaning, the dependent and independent variables need to have some real meaning. Understanding what the inputs means will go a long way to understanding what the regression output means.
    Quote Originally Posted by shg
    Mathematics is the native language of the natural world. Just trying to become literate.

  5. #5
    Forum Contributor
    Join Date
    08-06-2018
    Location
    Virginia
    MS-Off Ver
    2019
    Posts
    333

    Re: Need Help with "Regression" Analysis

    MrShorty -- you made some very excellent points.

    - Copy on the scatter plot... I also got the grid and realized that it's not really a scatter after all
    - Frequency counts on the crime rates is very helpful
    - I'll continue to think how to best represent "state"... I'm thinking to use population estimate (normalized)

    Again, thank you for the great info... VERY HELPFUL!

    Cheers,
    EEH

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 5
    Last Post: 02-05-2019, 12:03 AM
  2. Code VBA for "Linear regression's summary output"
    By 340dave in forum Excel Programming / VBA / Macros
    Replies: 10
    Last Post: 12-04-2017, 07:37 AM
  3. [SOLVED] Column X-Ref list - Sheet1 Col A "pages", Col B:FL "Req" to Sheet2 ColA "req", ColB "page"
    By excel-card-pulled in forum Excel Formulas & Functions
    Replies: 10
    Last Post: 04-07-2017, 09:30 AM
  4. [SOLVED] How to USE """"" cells count """"" change font color
    By austin123456 in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 10-09-2013, 06:14 AM
  5. [SOLVED] Complex "if" "or" "else" statement for ABC analysis
    By TheDanee in forum Excel Formulas & Functions
    Replies: 3
    Last Post: 02-06-2013, 08:43 AM
  6. [SOLVED] How to clear "linest ( ) function error " in regression analysis?
    By dev in forum Excel - New Users/Basics
    Replies: 0
    Last Post: 08-21-2005, 04:05 PM
  7. Why " data analysis plus " override " data analysis " once instal.
    By Alfred H K Yip in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 03-20-2005, 05:06 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1