+ Reply to Thread
Results 1 to 10 of 10

Clustering/Classification using Excel

  1. #1
    Registered User
    Join Date
    02-09-2005
    Posts
    79

    Clustering/Classification using Excel

    I am trying to do clustering/classification using the shortest euclidean distance. In the attached Excel spreadsheet, I am trying to classify new visits in Table 2 into one of the three visits given in Table 1. I have calculated the euclidean distance in Table 3 and classified them into one of the three visits.

    The issue I have is that the number of columns in the original data is more than 100 and I don't want to do the euclidean distance for each column (square the difference for each columns and add them) separately. Is there any efficient way to do this classification? Thank you.
    Attached Files Attached Files
    Last edited by vioravis; 02-22-2009 at 12:13 PM.

  2. #2
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Clustering/Classification using Excel

    I'm no mathematician so I'll accept you know what you want to do all I can do is tell you the way to calculate it using formulae...

    Please Login or Register  to view this content.
    The above can be copied across you entire results matrix.

  3. #3
    Registered User
    Join Date
    02-09-2005
    Posts
    79

    Re: Clustering/Classification using Excel

    Thanks. This worked fine. Is there any way to obtain the classification given in Column E of Clustering Table based on the minumum values on each row?

  4. #4
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Clustering/Classification using Excel

    Yes

    Please Login or Register  to view this content.

  5. #5
    Registered User
    Join Date
    02-09-2005
    Posts
    79

    Re: Clustering/Classification using Excel

    Thanks, this worked fine again.

    In the first formula you gave for calculating the euclidean distances, can you please explain the logic behind choosing zero for col_num in the INDEX function? Thank you.

  6. #6
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Clustering/Classification using Excel

    The zero is an argument to MATCH function, not INDEX. It specifies an exact match.
    Entia non sunt multiplicanda sine necessitate

  7. #7
    Forum Expert martindwilson's Avatar
    Join Date
    06-23-2007
    Location
    London,England
    MS-Off Ver
    office 97 ,2007
    Posts
    19,320

    Re: Clustering/Classification using Excel

    this bit
    =SUMPRODUCT((($B10:$E10)-(INDEX($B$3:$E$5,MATCH(B$17,$A$3:$A$5,0),0)
    i was puzzling on that too
    "Unless otherwise stated all my comments are directed at OP"

    Mojito connoisseur and now happily retired
    where does code go ?
    look here
    how to insert code

    how to enter array formula

    why use -- in sumproduct
    recommended reading
    wiki Mojito

    how to say no convincingly

    most important thing you need
    Martin Wilson: SPV
    and RSMBC

  8. #8
    Registered User
    Join Date
    02-09-2005
    Posts
    79

    Re: Clustering/Classification using Excel

    shg,

    I am still puzzled. I think the zero you are referring to is inside the MATCH bracket. I am referring to the one outside. Am I missing something here? Thanks.

  9. #9
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Clustering/Classification using Excel

    Quote Originally Posted by DonkeyOte View Post
    B18:
    =SUMPRODUCT((($B10:$E10)-(INDEX($B$3:$E$5,MATCH(B$17,$A$3:$A$5,0),0)))^2)
    Both MATCH & INDEX are using 0

    This...

    Please Login or Register  to view this content.
    Will return the Row index position in which B17 appears within A3:A5... this value is used in the index to establish our range of values that are to be subtracted from B10:E10...

    Please Login or Register  to view this content.
    The use of 0 in the INDEX will ensure an array of values is returned - ie all columns are returned in the referenced range... so in essence the above will reference B3:E5 and return contents of B:E for the row as determined by the MATCH function.

    The Sumproduct then "iterates" through B10:E10 and for each subtracts the appropriate value from Bx:Ex (where x is determined by the MATCH) and squares the difference... the result = sum of all these "iterations".
    Last edited by DonkeyOte; 02-22-2009 at 04:49 AM.

  10. #10
    Registered User
    Join Date
    02-09-2005
    Posts
    79

    Re: Clustering/Classification using Excel

    Thanks. This really helped.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1