+ Reply to Thread
Results 1 to 7 of 7

Regression with more than 16 variables

  1. #1
    showcase48
    Guest

    Regression with more than 16 variables

    I noticed linest and the regrssion tool doesn't allow more than 16 variables.
    Does anyone know the most convenient way to do this? I currently don't have
    SAS. Is it possible in Access or anything else?

  2. #2
    Alex
    Guest

    RE: Regression with more than 16 variables

    Without being pessimistic I think you'll struggle to do this mutlivariate
    regression in excel. There may be a way but it will be long-winded I am sure.

    As a suggestion, stats websites often allow you to input data and see the
    output running off Javascript as an example. Try and find a website that you
    can cut and paste your data to and crank the analysis...

    SAS could do it easily. The R project (google it to find it) offers a free
    stats package. It is the update of S and is pretty much as good as it gets.
    Upside is that it is free, downside is that it is not user friendly in that
    you would have to put some hours in before you got around to doing your
    multivariate regression.

    Sorry to be a naysayer....


    Alex

    "showcase48" wrote:

    > I noticed linest and the regrssion tool doesn't allow more than 16 variables.
    > Does anyone know the most convenient way to do this? I currently don't have
    > SAS. Is it possible in Access or anything else?


  3. #3
    Jerry W. Lewis
    Guest

    Re: Regression with more than 16 variables

    You could directly implement the equations that were used in Excel prior
    to 2003
    http://groups-beta.google.com/group/...595e596d?hl=en

    Also, the address of the R project is www.r-project.org, and there is an
    supplemental package for an R-Excel interface

    Jerry

    showcase48 wrote:

    > I noticed linest and the regrssion tool doesn't allow more than 16 variables.
    > Does anyone know the most convenient way to do this? I currently don't have
    > SAS. Is it possible in Access or anything else?



  4. #4
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,876
    How many rows/columns will the built in matrix functions (mmult, minverse, etc.) handle? Least squares regression "reduces" to some fairly simple matrix algebra which can be performed using the built in matrix functions. Most of the LSQ regression I do, I prefer to use the matrix notation rather than LINEST or LOGEST, but I've never had to regress on more than 5 or 6 parameters. I haven't heard of any limit to the size of matrix these equations can handle, so, in theory anyway, you should be able to perform the regression this way.

  5. #5
    Mike Middleton
    Guest

    Re: Regression with more than 16 variables

    MrShorty -

    > How many rows/columns will the built in matrix functions (mmult, minverse,
    > etc.) handle? Least squares regression "reduces" to some fairly simple
    > matrix algebra which can be performed using the built in matrix functions.
    > Most of the LSQ regression I do, I prefer to use the matrix notation
    > rather than LINEST or LOGEST, but I've never had to regress on more than 5
    > or 6 parameters. I haven't heard of any limit to the size of matrix these
    > equations can handle, so, in theory anyway, you should be able to perform
    > the regression this way. <


    Some of the limitations, e.g., maximum 52 columns by 52 rows for MINVERSE,
    are described at the end of the following Knowledge Base article:

    http://support.microsoft.com/default...b;en-us;166342

    - Mike
    www.mikemiddleton.com



  6. #6
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,876
    Based on the information in that article, once could (assuming enough RAM, processor time, and such) theoretically do a least squares regression for up 52 parameters, then.

  7. #7
    Jerry W. Lewis
    Guest

    Re: Regression with more than 16 variables

    MrShorty wrote:

    > Based on the information in that article, once could (assuming enough
    > RAM, processor time, and such) theoretically do a least squares
    > regression for up 52 parameters, then.


    Theoretically yes, but forming the normal equations is a numerically
    poor way to do least squares, since numerical singularity is then
    governed by the ratio of largest to smallest eigenvalues in X'X instead
    of that ratio in X. MINVERSE will often give a numerically inaccurate
    solution instead of warning of approximate numerical singularity.
    Consequently I would tend distrust the output.

    This issue in 1 dimension is the difference between calculating
    =SUMSQ(data)-SUM(data)^2/COUNT(data)
    (as was done by VAR, STDEV, etc prior to Excel 2003) and calculating
    =DEVSQ(data)
    The first form becomes more and more inaccurate as the CV of the data
    decreases. For example, STDEV(x,x+0.1) should be 0.0707106781186548
    =SQRT(0.005) regardless of x. But for x=100000 (CV=0.00007%) the first
    form (and STDEV in Excel 2002 and earlier) is inaccurate in the 4th
    significant figure of the standard deviation calculation. Between
    1000000 and 10000000, all significant figures are lost in the first
    calculation (summing squared x's has pushed the interesting part of the
    calculation beyond the accuracy if IEEE double precision). While the
    second form continues to work well.

    Personally, I can't remember ever being asked to analyze data with a CV
    that small, (and if I did, I would have used coding (linear
    transformation to a more practical range) to avoid numerical problems,
    but things are not so clear in the multiple dimensions of the general
    linear model. For instance, if you fit a 6th degree polynomial where
    the x data is 20.0, 20.5, 21.0, 21.5, ..., 27.0, the condition number of
    X'X is around 10^30, suggesting that direct solution of the normal
    equations would require quadruple precision to calculate it
    http://groups-beta.google.com/group/...9a2bb33e6cdbb8
    Pre-2003 LINEST (or the direct MINVERSE() formula linked from my
    previous post) will try, but will get no figures correct without
    warning. SAS PROC GLM and PROC REG will complain of numerical
    singularity (SAS's use of the sweep operator necessarily requires it to
    form the normal equations). That is why books on statistical computing
    recommend other approaches, such as Givens rotations, as implemented in
    http://lib.stat.cmu.edu/apstat/274
    http://lib.stat.cmu.edu/apstat/75
    the lm() function in S-PLUS and R will get 8 figures correct for each
    coefficient in the preceding example despite the ill-conditioning.
    Excel's trendline will get 9 figures correct, but will not go beyond a
    6th degree polynomial. There is a contributed package for R that would
    allow you to pass data from Excel to R, analyze it in R, and pass the
    result back to Excel. To learn more about R, go to
    http://www.r-project.org

    Jerry


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1