+ Reply to Thread
Results 1 to 5 of 5

Multiple linear regression in Excel with categorical dependant variable

  1. #1
    Forum Contributor
    Join Date
    05-26-2004
    Location
    Halifax, UK
    MS-Off Ver
    Office 2016
    Posts
    260

    Multiple linear regression in Excel with categorical dependant variable

    Does anyone know if it's possible to perform a multiple linear regression in Excel when the dependant variable is categorical rather than numerical?

    I have to try and find a formula to determine if a population of people belong in group A or B. I have quite a lot of sample data, about 390,000 rows in Excel. The independant variables are a mixture of categorical and numerical values, but at a push I could remove the categorical independant variables and just leave the numerical.

    I am not sure a linear regression is possible though due to the categorical dependant variable. What would the equation look like and what would it be saying? Do I need to use another type of regression?

    Also whenever I try to run the analysis in Excel I get "Input data is non-numerical" - however it isn't, I checked it all, even tried replacing it with randomly generated 1's and 0's and still get the same problem. I thought it might be something to do with trying to run a linear regression with a categorical dependant variable.....anyone got any advice?

    Thanks in advance, and sorry for the stats question but hoping someone can help.

    Rob

  2. #2
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,829

    Re: Multiple linear regression in Excel with categorical dependant variable

    What would the equation look like and what would it be saying?
    In many ways, I think this is the real question, and it is not necessarily an Excel question. We would be asking the same question if we were trying to do this in Minitab, or Matlab, or insert-favorite-programming-language-here. With categorical data, what does "X" mean in the analysis?

    I recall a question posed here where the user was doing something with cars. In this problem the cars were categorized into compact, midsize, and fullsize (maybe more). It was clear to me that the categorization was trying to indicate size. It seemed obvious to me that the user would want to assign some kind of number to each category that would be related to size -- small numbers for the compact, large numbers for full-size, with mid-size in between. If I were doing this for real, I might even try to find an underlying real variable that would be an indicator of size such as vehicle weight, or wheelbase, or engine displacement.

    You haven't said what kind of data you are looking at, but I think that is the basic idea. You want to look at your categories and come up with some kind of numeric descriptor that can "measure" the difference (or differences) between categories. Without that, the regression will likely not have very much meaning.
    Quote Originally Posted by shg
    Mathematics is the native language of the natural world. Just trying to become literate.

  3. #3
    Forum Contributor
    Join Date
    05-26-2004
    Location
    Halifax, UK
    MS-Off Ver
    Office 2016
    Posts
    260

    Re: Multiple linear regression in Excel with categorical dependant variable

    Hi MrShorty,

    Thanks for reply. I was thinking of a way of explaining better and maybe it's easier if I come up with an analogue study rather than try to explain the actual data I've got. Okay...so suppose I want to split the population of a country in to two groups: those that will live to less than 90 years, and those that will live 90 years or more. I have sample data from several hundred people and want to build a model or some kind of decision tree which predicts which category everyone outside the sample will fall in to.

    So...categorical dependent variable, it's not an age predictor just "lives to 90" or "does not live to 90". Then my predictor variables are a mixture of categorical and numerical, say gender, current age, ethnicity, number of cigarettes smoked per day, units of alcohol drunk per week, parents alive/dead etc.

    What is the best method of building my model/decision logic? I am not sure if these is even a stats question as I don't think a linear regression is going to work here. But what else would?

    Regards
    Rob

  4. #4
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,829

    Re: Multiple linear regression in Excel with categorical dependant variable

    I really don't know how statisticians perform that sort of analysis. My first impression is that one would be to perform the regression as if you were predicting age. Then, from a population/sampling based on the regression, compute a "probability" of reaching a certain threshold. But, again, I'm not familiar with the statistics of this sort of analysis, so there might be something else that statisticians use.

  5. #5
    Forum Guru TMS's Avatar
    Join Date
    07-15-2010
    Location
    The Great City of Manchester, NW England ;-)
    MS-Off Ver
    MSO 2007,2010,365
    Posts
    44,461

    Re: Multiple linear regression in Excel with categorical dependant variable

    Maybe Google: how does an actuary calculate life expectancy. Some interesting stuff which may, or may not, help.


    Regards, TMS
    Trevor Shuttleworth - Retired Excel/VBA Consultant

    I dream of a better world where chickens can cross the road without having their motives questioned

    'Being unapologetic means never having to say you're sorry' John Cooper Clarke


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Matrix Linear Regression in Excel - help!
    By Daniel86 in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 02-11-2013, 03:47 PM
  2. Replies: 0
    Last Post: 02-06-2013, 02:08 PM
  3. how do i graph multiple linear regression?
    By enortirol in forum Excel General
    Replies: 0
    Last Post: 02-21-2006, 09:25 AM
  4. non linear regression in excel
    By Nikeel in forum Excel Charting & Pivots
    Replies: 1
    Last Post: 01-04-2006, 08:15 PM
  5. How do I conduct linear regression in Excel with more than 16 x's
    By Ken in forum Excel Formulas & Functions
    Replies: 5
    Last Post: 09-22-2005, 08:05 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1