+ Reply to Thread
Results 1 to 6 of 6

Regression Coefficients Estimated Incorrectly When Centered Predictors Used

  1. #1
    Registered User
    Join Date
    05-23-2007
    Posts
    2

    Regression Coefficients Estimated Incorrectly When Centered Predictors Used

    Recently, I have encountered some problems related to getting accurate regression coefficients in Excel 2003 (all Office updates installed) from a dataset with *small* (standard) numbers, which contains centered predictors. More specifically, I have a dataset with 18 observed data points containing a criterion (y), a centered predictor variable (x), another centered predictor variable (z), and the interaction of the two centered predictor variables (xz). This multiple regression equation is structured to test for interactions between the two continuous predictor variables (x and z) as prescribed by Aiken and West (1991) in their classic book.

    When I run the regression in Excel with the centered predictors, some of the regression coefficents in the output are estimated to be 0, although they are clealry *not* 0 as estimated by SPSS 14.0.2. I have spent many hours troubleshooting this problem (and searched many forums on the internet) and still do not know why this is happening.

    Initially, I thought the problem might have to do with the cross-product of the centered predictors, but even just doing a regression with one of the centered predictors (for certain centered predictors) yields a regression coefficient of 0 (although it should be non-zero as per SPSS 14.0.2). When doing these multiple regressions with non-centered predictors, all regression coefficients are estimated accurately.

    I was wondering if anyone had any insights on why I am experiencing these problems.... If anyone wants a sample of some test data I have used to troubleshoot these problems, you can download a file from: http://publish.uwo.ca/~elebel/test_222.xls

    Thank you so much for your time.
    Sincerely,
    eplebel

    p.s. It is interesting to note that I just read an article found at www.mis.coventry.ac.uk/~nhunt/pottel.pdf which stated that regressions
    involving very *large* numbers may not yield accurate results due to problems with the STDEV function, but that centering the data before
    doing the analysis will correct this problem in most cases.

  2. #2
    Forum Contributor
    Join Date
    10-27-2006
    Location
    United Kingdom
    MS-Off Ver
    EXCEL 2003 (2007 at home)
    Posts
    517

    Normalize

    I have also come across this problem.
    In my experience it seems to occur when the values for some coefficients are several orders of magnitude different from the others.
    This can easily occur when dealing with polynomials when the nth power is getting huge compared to the 1st power.
    You can nearly always aviod the problem by scaling your initial numbers to run between -1 and 1, a practice known as "normalising" the data.
    This used to be standard practice in the old days of single precision calculations, but since double precision has become standard the problem is less frequently encountered.
    Note that while this gives you an accurate regression, the coefficients you get are appropriate to the NORMALISED data.
    It is possible to recover the correct coefficients using binomial expansion...

  3. #3
    Forum Contributor
    Join Date
    10-27-2006
    Location
    United Kingdom
    MS-Off Ver
    EXCEL 2003 (2007 at home)
    Posts
    517

    Smile Having Scared You

    It is generally possible to avoid the regresion problems without full normalisation.
    Just multiply data for the coefficients with the largest and smallest numeric range by a suitable power of 10 bring the overall range down to (roughly) within +/- 100.

    The "Correct" coefficient can then be recovered by applying the same multiplier.

    Good luck

    Mark.

  4. #4
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,889
    I think you may have just bumped up against one of the issues that has made me very leery of upgrading to Excel 2003 or higher. As I understand, when Microsoft reworked the algorithms for the spreadsheet regression routines (LINEST etc.) for the 2003 release, they included an algorithm that looks at each coefficient and arbitrarily sets to exactly zero (0) any coefficient that it believes should be 0. I do not know the details of this algorithm (you know how secretive MS can be about their code).

    Do you have access to an earlier version of Excel? Earlier versions employ a reportedly less stable numerical algorithm for performing regressions, but they don't arbitrarily 0 coefficients. If an earlier version of Excel can generate the correct coefficients, then that should solve your problem.

    I'm not familiar with this particular type of regression, but, if you can set it up so you can plot the pertinent data in a chart and use the chart trendline feature, MS didn't apply this "arbitrary 0" algorithm to the chart trendline routiines.

    Is there a way to set up the regression so the parameters are "scaled?" For example, I perform a regression quite frequently where one of the parameters is on the order of 1e-18. I factor out the 1e-18, so LINEST returns a value between 0 and 20 rather than 0e-18 to 20e-18. Not knowing exactly how MS implemented this arbitrary 0, I don't know for sure if this will work, but it might be worth a shot.

    I'm not sure if the routines in Solver will be stable enough, but you might try using Solver to perform the regression.

    Hope that helps. Let us know what happens.

  5. #5
    Forum Contributor
    Join Date
    10-27-2006
    Location
    United Kingdom
    MS-Off Ver
    EXCEL 2003 (2007 at home)
    Posts
    517

    Low Correlation

    I've had a quick look at the workbook on your link,
    I think that your problem is that the correlation on your data is extremely low!

    I suspect that EXCEL is setting the coefficient to zero when the standard error on a coefficient is so much bigger than the coefficient that it becomes nonsense to talk about a correlation between that coefficient and the data.
    This can happen when the relative sizes of coefficients become very disimilar, in this case centreing and scaling can help.
    With your data it is happening because there is no correlation, in this situation centering and scaling actually work to exacerbate the problem!

    If you squint there may be a vague correlation between Y and X and between Y and Z but it is not strong enough to support the XZ cross term.

    Mark.
    Last edited by Mark@Work; 05-23-2007 at 02:39 PM.

  6. #6
    Registered User
    Join Date
    05-23-2007
    Posts
    2

    problem "solved"...kind of!

    MrShorty's post hit the hammer right on the head.

    when i do the exact regressions using EXCEL 2000, everything is perfect and completely convergent with SPSS 14.

    so it seems like MICROSOFT's "improved" algorithm for LINEST in Excel 2003 wasn't so improved after all...!

    thanks again to all that posted useful comments.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1