+ Reply to Thread
Results 1 to 36 of 36

1st, 2nd & 3rd most common ranges within a single column of values

  1. #1
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    1st, 2nd & 3rd most common ranges within a single column of values

    Hi. Thanks for stopping by my question. Would you know how I can analyze data to give you the most common range, for a more meaningful analysis? I guess it would require that we define some parameters, but what are those parameters, and would this be a formula or array? Essentially, it would be, conceptually, like a scatterplot, but without the plot itself. I just want the data to be automatically analyzed to output the most common ranges. You probably already got what I'm asking, but here's a quick, unsophisticated example, assume 1 column like this:

    3
    91
    95
    96
    96.5
    89

    157
    177
    175
    181

    250
    299
    301

    793

    So, I'd want the output for the above example to be:

    1st most common range of values: 89-96.5
    2nd most common range of values: 177-181
    3rd most common range of values: 299-301


    I'm not sure how/where I'd tell Excel what constitutes a "range", or how this even works.

    Thanks for showing me the right direction/formula/array, since I think this might be more meaningful than just looking for 1 value that represents "mode", when often, there is no single number that occurs more than once in a data set and more meaningful data appears in clusters that are probably just ranges. Thanks for helping me out and guiding me down the right path.

  2. #2
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Hi.

    How are you defining "most common" in terms of a "range" of values?

    How is it determined what constitutes the start and end values for one of these "most common" ranges?

    Regards
    Click * below if this answer helped

    Advanced Excel Techniques: http://excelxor.com/

  3. #3
    Forum Expert sourabhg98's Avatar
    Join Date
    10-22-2014
    Location
    New Delhi, India
    MS-Off Ver
    Excel 2007, 2013
    Posts
    1,899

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Hello
    Just a wild try
    May help...
    Attached Files Attached Files
    Happy to Help

    How to upload excel workbooks at this forum - http://www.excelforum.com/the-water-...his-forum.html

    "I don't get things easily, so please be precise and elaborate"

    If someone's post has helped you, thank by clicking on "Add Reputation" below the post.
    If your query is resolved please mark the thread as "Solved" from the "Thread Tools" above.

    Sourabh

  4. #4
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,827

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    My first thought would be to start with the FREQUENCY() function: https://support.office.com/en-us/art...rs=en-US&ad=US

    I'm not exactly sure how you would come up with "bins" (maybe every 50 -- 0,50,100,150,...?). Once you come uyp with suitable bins, the FREQUENCY() function will calculate how many are in each bin, then you can use the LARGE() function ( https://support.office.com/en-us/art...rs=en-US&ad=US )to return the 3 largest frequencies. If needed, you could then use a lookup function to return the bin associated with that frequency.
    Quote Originally Posted by shg
    Mathematics is the native language of the natural world. Just trying to become literate.

  5. #5
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Exactly. Those are the key questions I'm basically also asking. In a way, what I'm looking to do is find a mathematical solution to what you and I normally do intuitively when we scan and automatically "make sense of" a scatterplot and summarize, "ok, so basically, values between here & here are important". I'm no mathematician, that's for sure, but I think, to answer these questions, it seems like you'd almost have to analyze each data set (my 1 column) uniquely first, in order to come up with statistical things like "variance", etc ... but I'm not trained in that, so I can't quite think how a statistician would think. So we'd want to find the most common ranges within any data set, where, indeed, the span between clusters or bins or ranges may be highly variable based on the data set. That's all I can think of for now. Hope it helps some. Please let me know if you have additional thoughts. Thanks for your post.

  6. #6
    Forum Expert newdoverman's Avatar
    Join Date
    02-07-2013
    Location
    Port Dover, Ontario, Canada
    MS-Off Ver
    2010
    Posts
    10,330

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Would this be something along the lines that you are looking for?
    A
    B
    C
    D
    E
    F
    G
    1
    Data Bins Frequency Range Large to Small
    2
    3
    0
    0
    100
    6
    3
    91
    100
    6
    200
    4
    4
    95
    200
    4
    300
    2
    5
    96
    300
    2
    6
    96.5
    400
    1
    7
    89
    500
    0
    8
    157
    600
    0
    9
    177
    700
    0
    =IF(ROW()-1>3,"",LARGE($C$1:$C$11,ROW(INDIRECT("1:3"))))
    10
    175
    800
    1
    11
    181
    900
    0
    =INDEX($B$1:$B$11,MATCH(G1,$C$1:$C$11,0))
    12
    250
    1000
    0
    13
    299
    =FREQUENCY(A1:A14,B1:B11)
    14
    301
    15
    793
    <---------If you like someone's answer, click the star to the left of one of their posts to give them a reputation point for that answer.
    Ron W

  7. #7
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Quote Originally Posted by MrShorty View Post
    Mathematics is the native language of the natural world.
    MrShorty,

    Thanks for your thoughts there re frequency and bins. Yeah, the bins thing is really the key, but I need a mathematical solution (your quote is perfect for this post) to what our brains do intuitively/second nature when we visually scan, say, a scatterplot and are able to quickly come up with a solution. This question kind of feels like a "Brain Games meets Excel" challenge. Thanks for your post and hopefully we can all figure it out. My post to Xor LX has some related thoughts. Feels like someone with a statistics type of background could really lend some valuable insights here for us as well. I'm really interested to learn more and find a solution to this puzzle. Thanks

  8. #8
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    newdoverman

    let me have a looksee, and get back to you. thank you

  9. #9
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    newdoverman,

    First, thanks for your post. Ok, to answer your question directly, no, I don't think bins with arbitrary 100,200, 300 etc cutoffs is the way to go because those are not the natural cutoffs inherent to the data. It's not what the human eye/our brain would intuitively do, eg, when examining a scatterplot, to use my example above.

    I almost feel the solution would be like this (I'm a beginner honestly):

    There would be "2 runs" (at minimum), where (I guess) an array would do a "first pass" evaluating the data for conformity, variance, or whatever statisticians would understand way better than me. That first run will essentially "create the bins". The 2nd pass will do the counts. The key is "what are the bins (the ranges)?" The funny thing is that the DATA clusters themselves should "tell us" what those bins are....and maybe someone will have a frequency solution, combined with some stats that won't even need "bins". These are just some thoughts.

    PS A follow up thought I had: I wonder if an array would help out. Here's why I'm thinking that: Basically an array could (?) continually "sweep" or do passes thru the data, using unique, variable increments with each "sweep" (each pass) until if finds the right "bin" (range). I really don't know the solution though. Thanks to everyone for their help so far. It will be interesting to see what the solution is in the end.
    Last edited by IO4lZE; 02-22-2015 at 04:37 PM.

  10. #10
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,167

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Hi IO4IZE,

    I have a little different spreadsheet. See if this helps.
    Attached Files Attached Files
    One test is worth a thousand opinions.
    Click the * Add Reputation below to say thanks.

  11. #11
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    sourabh, will review your earlier .xlsx and respond shortly, thank you.

  12. #12
    Forum Expert newdoverman's Avatar
    Join Date
    02-07-2013
    Location
    Port Dover, Ontario, Canada
    MS-Off Ver
    2010
    Posts
    10,330

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Here is another way of looking at this. Use of Pivot Table ranges:
    Attached Files Attached Files

  13. #13
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    sourabh & marvinp, can you explain the .xlsx, and the logic as we go from A thru I (sourabh), or A thru N (marvinp)?

    I'd appreciate it. That will help me follow along.

  14. #14
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    To all, Please don't just post excel files. Please include an explanation of your logic so I can follow along. Thanks a lot.

  15. #15
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    In the meantime, I'm trying to go thru the 3 .xlsx posted so far, to see if I can make sense of them. Right now I'm looking at MarvinP's since it looks like an updated version of sourabh's.

  16. #16
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,827

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Quote Originally Posted by IO4lZE View Post
    MrShorty,

    Thanks for your thoughts there re frequency and bins. Yeah, the bins thing is really the key, but I need a mathematical solution (your quote is perfect for this post) to what our brains do intuitively/second nature when we visually scan, say, a scatterplot and are able to quickly come up with a solution. This question kind of feels like a "Brain Games meets Excel" challenge. Thanks for your post and hopefully we can all figure it out. My post to Xor LX has some related thoughts. Feels like someone with a statistics type of background could really lend some valuable insights here for us as well. I'm really interested to learn more and find a solution to this puzzle. Thanks
    It is an interesting question. I might suggest that, in many ways, in much the same way as we might see this as a mathematical problem first, I might also suggest that this is more of a generic "algorithm development" question. The way I see this sort of thing happening:

    1) mathematical definitions and concepts.
    2) Algorithm development -- this is where we take those "intuitive" steps and try to formalize and systematize them into something a series of steps and processes that will accomplish the intuitive part with the mathematical parts.
    3) Then, we work on implementing our algorithm in our chosen programming language (in this case -- spreadsheet formulas and functions).

    Since we are kind of stuck on step 2, I would probably first start with some research into others' algorithms for similar analyses (since I have a hard time believing that we are the first people in all of human history to wonder about this kind of analysis). A quick internet search comes up with this from Wikipedia: http://en.wikipedia.org/wiki/Cluster_analysis
    This one: http://www.readperiodicals.com/201006/2108164771.html
    This one: http://www.data-miners.com/companion/Chapter11.ppt
    and many others.

    I haven't got time right now to digest such a broad topic, but I would start by researching these algorithms and see how they are implemented. Then come back to Excel and look at how I might implement these in Excel.

  17. #17
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    ignore this post #17. Pls see instead my next post #21, since I completely rewrote this post. Thanks!
    Last edited by IO4lZE; 02-22-2015 at 05:44 PM. Reason: completed edited

  18. #18
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Mr Shorty, thank you for that. Good. Yeah, it looks like we're making progress right now with sourab & marvinp, and I'm reviewing newdoverman's last post. Have an awesome day, and thanks again.

  19. #19
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    newdoverman, I see where you're going for sure, thanks! So I can see you've tightened the ranges from 100s down to 20s. But the prob with that, respectfully, is that they're still arbitrary (you make them up), instead of the perhaps preferred approach in marvinp et al, which is (almost) fun 'cuz you can try and walk thru the logic one column at a time (ie helper columns...which is the premie stage I'm at, lol). I'm still awaiting a response to my last question to them (re logic in Column D). newdoverman, I really appreciate you.

  20. #20
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,827

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    From an internet search for "cluster analysis in Excel":

    http://www.lynda.com/Excel-tutorials.../165438-2.html
    http://www.neilson.co.za/k-means-clu...crosoft-excel/ (though this one is more VBA than Excel).

    There were many other hits as well.

  21. #21
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    sourabh & marvinp,

    I've figured out what the "60" is in your .xlsx in $D$1.

    It's an Average difference from 1 value to the next (an average of the column C differences).

    Since all the bins are ultimately based on Column D, and Column D2:D14 are relative to the "60" value, I'm trying to determine how you got the right Upper & Lower limits of those ranges (Column E & F) based on the yes/no comparison to D$1$. Still reviewing that.

  22. #22
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    sourabh & marvinp,

    I'm presently evaluating D & E ... pretty thoughtful approach. I'll need to think more on it. First, thanks. Second, I think I see an error:

    Why is the value "89" from column A missed? Shouldn't that be the first lower limit of the first range, instead of 91, esp since the very first value is only 3?

    Thanks for your thoughts.

    I'm stepping out for a quick bite. Will check back later. Kind regards and thanks for the help so far...

  23. #23
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,167

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Hi IO4IZE,

    Lots of problems are clearer if looked at from a different direction. This problem seemed to have "funny" ranges or Bin limits in it. They weren't evenly spaces, just kind of random. I was interested to see what Frequency would do with your numbers with funny bin limits. I needed to do an Array Formula (CSE formula) to get the Frequency Function to work.

    Then it occurred to me we could do this problem easier/differently just using CountIF. See the attached where I simply counted the number of numbers less than the bin limits in Column A. Countif() isn't an Array function and works a lot faster. Then I simply subtracted the countif below each row to see how many were in that specific bin.

    When I look at the problem in this way, who needs Frequency when we have Countif? See the attached. Does this help? Do you still want to see which bin is fullest?
    Attached Files Attached Files

  24. #24
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Marvin,

    Thanks for the follow up.

    Here's the problem imho. In this more recent CountIF version, you're falling into a similar trap encountered earlier in this thread, which is that you're assigning arbitrary bin cutoffs. That's exactly what I don't want. I want the opposite. I want the bins (ranges) to be automatically detected as shown below. I followed the counts on your previous version (array in M, etc). The primary challenge is not so much the counting, but rather first finding those Lower & Upper limits, not unlike you had previously.

    In a sense, the "smart" bins, or ranges, will automatically detect "clusters" of data, and then report that back. Using my initial example from my first post, that would be like:

    3
    91
    95
    96
    96.5
    89

    157
    177
    175
    181

    250
    299
    301

    793

    where I'd want the output for the above example to be:

    1st most common range of values: 89-96.5
    2nd most common range of values: 177-181
    3rd most common range of values: 299-301

    Where to go from here...

    I think the previous version, which focused on the main challenge, namely determining the ranges, is the best place to continue from. However, since you've already shown a lot of sophistication with that, I'm wondering (dare I ask?) if you or others might be able to make it more sophisticated (more accurate), as follows: It's good but not great in one respect: It's basically saying "as soon as you take a leap that's greater than the average leap, then mark that as the lower limit" (and related logic for upper limit). But, the fault (or area for improvement) is if we can figure out a smarter way (statistically speaking), a logic that's more accurate than "average leap", since, if there's a HUGE gap somewhere in the data, then that gap will ultimately be responsible for missing (not calling) some ranges of values.

    Also, a quick side note: I'm learning that thinking in terms of "bins" is the wrong way to go, since we don't want predetermined, arbitrary bins with arbitrary cutoffs since assigning arbitrary cutoffs (funny or not) misses the point completely; rather we want the formulas/arrays to report back the ranges (lower & upper limit) of the actual data "clusters", if you will, that is, where values are most common, using an approach like your initial one. So just wondering if we can improve from "average leap" to something more accurate, and avoid the notion of "bins" altogether, continuing from Frequency with strange Bin Ranges.xlsx, and simply modifying/improving the logic from D thru F. Thanks for your help so far.

  25. #25
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,167

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Ok,

    So you don't know the low and high value for each bin. You want "the code" (us) to tell you what these are. Can you tell me how many bins there will be? How many ranges/buckets/bins are we supposed to come up with? That might help!!

    This sounds more like a statistics problem now. I'm thinking of bell shaped curves over "clusters" of points/bins. How do we know if 5 clusters are better than 4? Some Least Squares Regressions tool?

    I'm starting to get a feeling for the problem but this might be one of those without enough information to give an exact answer.

  26. #26
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    How many ranges/buckets/bins are we supposed to come up with? That might help!!
    marvin? Thanks for your post. You'll notice that that's already been discussed at length in this thread. Also please don't be angry or pretend to be angry at me. From your eyes, emoji and extra punctuation, it looks like you are bugging out. Please keep focused and professional thanks. Nah, I didn't want "the code". That's weird cuz I never even used that term and I'm not like that; and for anyone reading the whole thread, they can see that's an unfair mischaracterization. I've been very open-minded, inspired, and have been just as interested in the theory and the math as much as any solution. Actually, you're asking me to provide the
    ranges/buckets/bins
    but that's actually the point of the post is to come up with the ranges, very much like I described in post#24 which is the same as I posted originally. Appointing arbitrary "buckets" or "bins" is like trying to push a square peg through a circular hole, pigeon holing really. And it means you (theoretically) didn't understand the initial post or other posts where I talk about "scatterplots" and "intuitive" and math, etc.

    Marvin, what's strange is that you specifically did not use bins in your first submission and had a completely different solution that didn't involve bins in your attachment Frequency with strange Bin Ranges.xlsx, so I thought you already understood what we're trying to do here? I thought I was being very genuine and kind and helpful to others in explaining the .xlsx you submitted...but simply attached but without an explanation or any discussion. I think you misread my entire last post #24 or are acting like you took it personally because your last post was very defensive. You misread or want to be perceived as having misread my previous post before this one.

    When you say
    I'm starting to get a feeling for the problem but this might be one of those without enough information to give an exact answer.
    , please don't speak on behalf of others. Let others make their own decisions and help and come up with their own methods. No, there is enough information. You may just not have the answer or want to say that. This is not a simple black and white question as you apparently already knew from your Frequency with strange Bin Ranges.xlsx, Columns D-F. So you knew all this already. This is quite strange.

    I thought you would be more gratious since the xlsx you submitted was a carbon copy of the one initially posted by another user (sourabh), buit just added a count? That part confused me. Who came up with that non-bin logic? Like I stated in my post before yours, or post #24, that seemed to show an understanding of the challenge initially posted, quite clearly I might add. No need to post back if you have additional negative comments, sarcasm, or insults. Those were inappropriate.

    In my last paragraph there I provided additional clarity in response to others' (not mine) question about "bins":

    I'm learning that thinking in terms of "bins" is the wrong way to go, since we don't want predetermined, arbitrary bins with arbitrary cutoffs since assigning arbitrary cutoffs (funny or not) misses the point completely; rather we want the formulas/arrays to report back the ranges (lower & upper limit) of the actual data "clusters", if you will, that is, where values are most common, using an approach like your initial one. So just wondering if we can improve from "average leap" to something more accurate, and avoid the notion of "bins" altogether, continuing from Frequency with strange Bin Ranges.xlsx, and simply modifying/improving the logic from D thru F. Thanks for your help so far.
    For others, post#24, and really all my posts, summarizes the issue well, and solutions are welcomed.

  27. #27
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,827

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    I don't know if anyone looked at some of the links I posted. I looked a little closer at the k-means algorithm page. Seems simple enough, especially for 1D data like we have here:

    1) Choose how many "clusters" there should be. Note that the documentation says that this is more difficult if you want the algorithm to determine how many clusters to find, so, as a first look at the algorithm, specify how many clusters. Your intuitive analysis suggest 3 clusters.
    2) "Guess" at the means for the three clusters (maybe 90, 150, and 300). Some suggest that these could be "random" for the first iteration, though random choices don't seem to work well with this set and our first ever look at this kind of algorithm.
    3) "Assign" each data point to an "average".
    4) Compute three new averages based on these assignments.
    5) Return to step 2 with the new averages, reassign data points to the appropriate averages.
    6) Repeat until the three averages stop changing.

    It's not very good, it was put together quickly by a novice in these cluster algorithms, but maybe it is a start. In any case, I think that, if someone were to take the time and effort to become familiar and somewhat proficient in these kind of algorithms, such a problem shoudl be solvable.
    Attached Files Attached Files

  28. #28
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    MrShorty,

    Your kungfu is strong. First, I apologize because, for some reason, I didn't see your post #20. Second, huge thanks for providing a sense of direction. Ok, it's quite late here and I want to study up on these most recent 2 posts from you when I get back to the office tomorrow morning. I'm looking forward to checking that out for sure. and following up here of course with you.

    Have an awesome evening.

  29. #29
    Forum Expert sourabhg98's Avatar
    Join Date
    10-22-2014
    Location
    New Delhi, India
    MS-Off Ver
    Excel 2007, 2013
    Posts
    1,899

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Hi
    Sorry for the late reply
    Actually there is no specific way to extract ranges from a large array that are most frequent because there is no specific criteria... until we get some specific ranges so that we can extract frequencies of the specified ranges and then them most frequent ones...so what I have done is this...

    --In the column B I have arranged the observations from column A in ascending order
    --In column C I subtract each value in column B with the value above it...
    --Now in cell D1 I have the average of those differences which we got in column C
    --Now D column Checks if the difference of each observation (with the value above it) is smaller than the average difference or not...if yes then gives a 1 otherwise returns 2
    --Now if see D column we have values in the form 2,1,1,1,1,2,1,1,1,2,1,1,2 ..we can observe that every 1 after a 2 represents the lower limit and every 1 with a 2 after it represents the upper limit...
    --So column E extracts the upper limits and F the upper limits
    --Now G and H columns just remove the blank cells from column E and F
    --Finally we combine the columns G and H with a " - " in column I

    Hope this helps!
    Last edited by sourabhg98; 02-23-2015 at 09:44 AM.

  30. #30
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Sourabh,

    Yes, I was able to figure all that out on my own back on posts #15 & #21. thanks anyway.

    And I even summarized it in post #24, which is the current direction of this thread:

    It's good but not great in one respect: It's basically saying "as soon as you take a leap that's greater than the average leap, then mark that as the lower limit" (and related logic for upper limit). But, the fault (or area for improvement) is if we can figure out a smarter way (statistically speaking), a logic that's more accurate than "average leap", since, if there's a HUGE gap somewhere in the data, then that gap will ultimately be responsible for missing (not calling) some ranges of values.
    You wrote,

    Actually there is no specific way to extract ranges from a large array
    However, that's actually not true. I think you also may have missed MrShorty's post #27. Damn, I feel like a moderator, lol. Remember, if you're gonna set a fire, stick around.

    OK, thanks.

  31. #31
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    MrShorty, Just a quick note here to let you know I've fallen a little behind, and haven't forgotten, but mostly to express thanks, and let you know I promise to follow up here as soon as I can...thanks/thanks all. We had some wild weather here today with the first snow in decades, perhaps like most of the country & probably world, so had to get out with the fam for scenic photos and that whole thing. I hope you're doing great and sending my respects along.

  32. #32
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    MrShorty,

    Right on! k means cluster analysis would definitely serve as that more sophisticated method and is pretty much what I was getting at. Reading thru that made me feel probably how a caveman felt when he did his first cave drawing, or when he first said "mama"- haha. I like that idea, esp as described:

    “In statistics and data mining, k-means clustering is a method of cluster analysis which
    aims to partition n observations into k clusters in which each observation belongs to the
    cluster with the nearest mean.
    That seems like a step up from what I was calling the "average leap". I was really surprised to see how much coding, that Sheldon Neilson (kudos), eg, put into this concept, and the k means cluster analysis really seems like a good answer to my original question in this thread. I like that it runs thru all the iterations (why I originally thought an array might be best) to find these clusters
    The k-Means algorithm is an iteration of the following steps until stability is achieved i.e. the cluster assignments of individual records are no longer changing
    this quote from the http://www.neilson.co.za/k-means-clu...crosoft-excel/ previously provided. And I also like that data observations belong to the cluster with the nearest mean...which is a variant, albeit, of "average"; only it doesn't compare it to the average of the entire data set, making it a more accurate approach I believe.

    Well, I still have to go thru your latest table, and will follow up thereafter for sure. And, this week I'll definitely want to do some more independent research on k means, as you wisely suggested. Thanks for providing the guidance you did thru this clusterf. It's cool if you think about all these posts as individual observations in a data set, and how it all kind of came together (like a cluster) that wasn't necessarily that obvious; just a neat notion I think. Equally cool was that, while I'd mentioned ranges and "scatterplots" early on, your post #20 was first to mention the term clusters, and I hadn't actually seen it (my bad) til after I'd mentioned "clusters" in post#24. This thread has been a pretty cool experience and I learned a lot, thanks to the collective enterprise. I'll follow up as promised. Thanks for your patience in the meantime.

  33. #33
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    I'm also learning a lot from this wikipedia page on:

    http://en.wikipedia.org/wiki/K-means_clustering

    There's a lot on that page that shed a lot of light on various aspects from this thread, things like cluster shape, some of the history of k means, initiation methods, Forgy vs Random partition methods, fuzzy stuff, algorithms, variations, and the discussion section, and the importance of k means in signal processing and data mining.

    Just wanted to share that follow on with everyone.

    Thanks again MrShorty. Will keep in touch.
    Last edited by IO4lZE; 02-25-2015 at 06:03 PM.

  34. #34
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,827

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    If you want more research and more complication, after looking briefly at marvinp's and sourabhg98's proposals, they look like variations on the "basic sequential algorithm" scheme: http://en.wikipedia.org/wiki/Basic_s...rithmic_scheme

    If you are interested in a different looking algorithm for this problem.

  35. #35
    Registered User
    Join Date
    01-17-2015
    Location
    California
    MS-Off Ver
    Excel 2010
    Posts
    27

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Thanks. Well, I looked at the BSA, and imho, sourabh/marvinp's didn't look like that. Theirs is simply finding the average distance between values, and then creates an upper or lower limit as soon as one number goes to the next but only if that distance > the average difference.

    In the process of reviewing your last link, though, I came upon another nice page with various clustering methods here: http://en.wikipedia.org/wiki/Categor...ing_algorithms , which, interestingly, includes "K-means ++ (basically an improvement over k means) with more info here http://en.wikipedia.org/wiki/K-means%2B%2B . However, perhaps even a basic k means would be cool to work out; either way, thanks. Ok, from here on I apologize if I wrote too much:

    Ok, I've checked out your most recent prop, and I do admire & appreciate that it takes what appears to be a more accurate approach...but I wonder how much the arbitrary 100, 180, 300 cutoffs had to do with that, since it sort of relies on a "visual first" (which I assume played a role in creating them). What I liked, of all things though, was how how you corrected for outliers, by comparing the distance to these arbitrarily-defined centroids to the SD of the resulting #s within that kluster.

    I was thinking the problem is really all about our initiation method. Your most recent prop kind of relies, first, on picking arbitrary cutoffs for your centroids (100, 180, 300). Plus we kind of picked "3" as our # of klusters, and then arbitrarily picked 3 (perhaps convenient) cutoffs. I was thinking that, if I were doing it, it would be easy to "cheat", meaning one could just pick # of klusters and the actual centroids for initiation, by sneaking a peak first (which any smart person would do if they could). But in cases with massive amounts of data, it's just not feasilble (like the way I've conveniently bolded certain sets for our outcome). What I'd like is to be able to not have to study the data first, visually, but to be able to do this "blind", if you will. It looks like the initiation method and the iterations are the 2 key steps. It seems we'd need to first figure out the smartest initiation method, but again blindly and randomly. From http://en.wikipedia.org/wiki/K-means_clustering:

    Initialization methods

    Commonly used initialization methods are Forgy and Random Partition.[9] The Forgy method randomly chooses k observations from the data set and uses these as the initial means. The Random Partition method first randomly assigns a cluster to each observation and then proceeds to the update step, thus computing the initial mean to be the centroid of the cluster's randomly assigned points. The Forgy method tends to spread the initial means out, while Random Partition places all of them close to the center of the data set. According to Hamerly et al.,[9] the Random Partition method is generally preferable for algorithms such as the k-harmonic means and fuzzy k-means. For expectation maximization and standard k-means algorithms, the Forgy method of initialization is preferable.
    I guess yours is more the Forgy partition style (referring to my first wiki link above), although I was wondering if there was an equally straight-forward way that didn't necessarily arbitrarily pick #klusters and the exact centroids initially. Wondering, eg (once we figure out best initiation method), if a modern day array would help us to quickly run thru iterations to help determine the both the # of clusters and the actual "centroids", a la some combination of one of these (preferably Random) partition initiation methods plus k means or k means ++, or something similar. Ultimately, I want to stick with the main idea of k means, namely partitioning n observations into k clusters in which each observation belongs to the
    cluster with the nearest mean.


    Having sort of stumbled here with too much written, what I'd like to achieve (life: a work in progress) is shown in the DIAGRAM below the paragraph on Initiation (shown above) within the link above. In that diagram, it shows 4 STEPS. K means++ has a different seeding method, and is apparently twice as fast computationally (on iterations) and 1000x more accurate. Proposed in 2007, it says:

    This seeding method yields considerable improvement in the final error of k-means. Although the initial selection in the algorithm takes extra time, the k-means part itself converges very quickly after this seeding and thus the algorithm actually lowers the computation time. The authors tested their method with real and synthetic datasets and obtained typically 2-fold improvements in speed, and for certain datasets, close to 1000-fold improvements in error. In these simulations the new method almost always performed at least as well as vanilla k-means in both speed and error.
    Generally, at least to me, it seems like the key to finding clusters is identifying the fact that the distance between each of the points in that cluster is (a lot) less than the larger (eg "average") difference between data points throughout the entire data set. So, I think an array that runs through iterations finding these "nearest means" is the key. A lot of this sounds like identifying star systems or galaxies by their relative proximity, etc.

    In fact, another method called "k-medians" made me think that one could also use that in combination, in other words, while running thru the means iterations, it tries to find the median or central number within the (draft) cluster, and then keeps feeding that back into the array. So, I was thinking a combination of iterations that use both mean and median to identify the clusters. I know I'm mostly babblying and probably looking foolish around now. But I do feel the thought process is important. I know you're the same. Enjoying this, even though reading up on all these methods makes my eyes a little blurry. I'd obviously be more than happy implementing something like k means. Well, thanks for allowing me to ramble. Sorry, I know I wrote too much today. Kind regards.
    Last edited by IO4lZE; 02-25-2015 at 09:50 PM.

  36. #36
    Forum Guru Pete_UK's Avatar
    Join Date
    12-31-2011
    Location
    Warrington, England
    MS-Off Ver
    Office 2019 (still learning)
    Posts
    24,726

    Re: 1st, 2nd & 3rd most common ranges within a single column of values

    Here's a recent thread that you might find useful:

    http://www.excelforum.com/excel-prog...ml#post3998817

    Hope this helps.

    Pete

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] need vba code to compare two column get result of common values of both column
    By breadwinner in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 11-12-2013, 05:42 AM
  2. Find common Items in two columns and add corresponding values and provide a single list
    By cell1cell2 in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 05-09-2013, 05:24 AM
  3. Find common Items in two columns and add corresponding values and provide a single list
    By cell1cell2 in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 05-09-2013, 05:23 AM
  4. Single trend line for different y values and common x value
    By vikrannt.patil20 in forum Excel Charting & Pivots
    Replies: 1
    Last Post: 04-17-2013, 03:29 AM
  5. Returning most common values in a column
    By Sixty7 in forum Excel General
    Replies: 2
    Last Post: 03-03-2011, 05:55 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1