+ Reply to Thread
Results 1 to 14 of 14

Finding Most Frequently Occurring Combinations in Large Data Set

  1. #1
    Registered User
    Join Date
    02-12-2013
    Location
    Connecticut
    MS-Off Ver
    Excel 2010
    Posts
    50

    Finding Most Frequently Occurring Combinations in Large Data Set

    Hi,

    I have a question about how I can find patterns in large group of data. I’m familiar with Pivot Tables and know formulas pretty well, but I’m stuck on how to find patterns in large group of data. More specifically, I can’t figure out how to get to the “items most frequently bought together” from the data set that I have.

    Here’s the situation. We have thousands of orders and thousands of items (SKUs) with two statuses (Shipped & Unshipped). Customers can buy only one item or in any combination of items. Any ordered items can either be Shipped or Unshipped. See attached spreadsheet for an example.

    We get a text file report of the orders. Each item gets its own line item (row), per order, per status. In other words, if a customer buys 10 of an item on the same order, but 6 of them are Shipped and 4 are Unshipped, then that item would be on 2 different rows, one for 6 Shipped and one for 4 Unshipped on the same order number.

    We need to get this information to give us a better sense of what to cross-promote to customers.

    BTW, I have Excel 2010 running in Windows 7.

    Any help/ insight/ideas with this would be greatly appreciated. Thank you in advance.

  2. #2
    Forum Expert newdoverman's Avatar
    Join Date
    02-07-2013
    Location
    Port Dover, Ontario, Canada
    MS-Off Ver
    2010
    Posts
    10,330

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    I have added a sheet to your workbook. The calculations should be self-explanitory.

  3. #3
    Forum Guru
    Join Date
    08-05-2004
    Location
    NJ
    MS-Off Ver
    365
    Posts
    13,578

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    I created a table in F6:L12 with SKU values across top and down side. Anything above the diagnol is set to NA since they would be duplicates (i.e Black White is the same as White Black)

    The formula is an Arrayed Formula (ie. enter with CNTRL SHFT ENTER instead of ENTER). In G8 for example is

    =COUNT(MATCH(IF($B$2:$B$35=$F8,$A$2:$A$35,""),IF($B$2:$B$35=G$6,$A$2:$A$35," "),0))
    which counts the number of times an order matches both F8 and G6.
    You can visually inspect the table for maximums or (better yet) use Conditional formulas to highlight them.
    Assuming that there isn't a match for the maximum, you can use a formula to pull out that maximum

    in G15 (again an array)
    =INDEX($F$7:$F$12,SUM(IF(MAX($G$7:$K$12)=$G$7:$K$12,ROW($G$7:$K$12),""))-6)&"-"&INDEX($G$6:$K$6,SUM(IF(MAX($G$7:$K$12)=$G$7:$K$12,COLUMN($G$7:$K$12),""))-6)

    I would think it should be easier but I can't find it.
    Attached Files Attached Files
    ChemistB
    My 2?

    substitute commas with semi-colons if your region settings requires
    Don't forget to mark threads as "Solved" (Edit First post>Advanced>Change Prefix)
    If I helped, Don't forget to add to my reputation (click on the little star at bottom of this post)

    Forum Rules: How to use code tags, mark a thread solved, and keep yourself out of trouble

  4. #4
    Registered User
    Join Date
    02-12-2013
    Location
    Connecticut
    MS-Off Ver
    Excel 2010
    Posts
    50

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Hi newdover man,

    Thank you very much for your reply.

    I have looked at the original data set and your response spreadsheet and I think I need to clarify a few things...

    What I'm looking for is counting the frequency of combinations on a PER ORDER basis, not just how frequent an item is. For example, if BLACK and WHITE are only bought together on one order, then the count for this combination would be "1"...this is regardless of how often BLACK or WHITE are bought individually on their own orders. In the instance where there is 1 order of BLACK and WHITE, 20 other orders in which only BLACK is ordered, 20 other orders in which only WHITE is ordered and 1 order where BLACK and YELLOW are ordered, the combinations of "BLACK and WHITE" and "BLACK and YELLOW" would be tied for the most frequent combinations with "1" each.

    In your example, BLACK and WHITE are counted as 6 occurrences, but in the original data sheet there are only 4 instances of these combinations (Orders 3, 7, 13, and 19 only). So, I'm not sure if your' taking the PER ORDER count into account.

    I didn't realize that you could use the COUNTIFS formula (thank you for that , but it's not exactly what I'm looking for...maybe it's just a "tweeking" of the formula.

    Would you mind clarifying this? I don't really care if the items are Shipped or Unshipped on a per order basis, just if they're ordered together.

    Thank you very much for helping with this situation.

  5. #5
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Chemist, that is a very clever solution.
    Entia non sunt multiplicanda sine necessitate

  6. #6
    Forum Guru
    Join Date
    08-05-2004
    Location
    NJ
    MS-Off Ver
    365
    Posts
    13,578

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Thanks shg!

  7. #7
    Registered User
    Join Date
    02-12-2013
    Location
    Connecticut
    MS-Off Ver
    Excel 2010
    Posts
    50

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Hi ChemistB,

    Thank you very much for your response. Your response looks interesting, even though I don't know how you did what you did . I just have a couple of thoughts/questions about your solution:

    1) In looking at your result in cell I10, it says that that combination (Black/White) shows up 5 times in the data set, which is incorrect. It actually shows up only 4 times (Order 3, 7, 13, and 19)...you can see this by checking the data set. Since I have no idea how you came up with your solution, I have no idea how to try to modify this to come up with the correct answer.

    2) In the cells that you have "N/A", are these manual inputs or a formula? Since we have thousands of items, I don't want to manually input this textual input into every empty cell. I see that in H8 you have a formula to automatically produce "N/A", but I don't know how to incorporate this along with cells that have an actual number. Worse case scenario, I'm OK with having numbers for both "Black/White" and "White/Black", since if I do Conditional Formatting, both will be highlighted.

    3) Your solution gives me the MOST frequently occurring combination. Is there a way to show the "Top X" of this list, like the Top 10, Top 20, etc.?

    Thank you again for your help on this. If you can please clarify my followup questions, I would truly appreciate it.

  8. #8
    Registered User
    Join Date
    02-12-2013
    Location
    Connecticut
    MS-Off Ver
    Excel 2010
    Posts
    50

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Hi ChemistB,

    Thank you very much for your response. Your response looks interesting, even though I don't know how you did what you did . I just have a couple of thoughts/questions about your solution:

    1) In looking at your result in cell I10, it says that that combination (Black/White) shows up 5 times in the data set, which is incorrect. It actually shows up only 4 times (Order 3, 7, 13, and 19)...you can see this by checking the data set. Since I have no idea how you came up with your solution, I have no idea how to try to modify this to come up with the correct answer.

    2) In the cells that you have "N/A", are these manual inputs or a formula? Since we have thousands of items, I don't want to manually input this textual input into every empty cell. I see that in H8 you have a formula to automatically produce "N/A", but I don't know how to incorporate this along with cells that have an actual number. Worse case scenario, I'm OK with having numbers for both "Black/White" and "White/Black", since if I do Conditional Formatting, both will be highlighted.

    3) Your solution gives me the MOST frequently occurring combination. Is there a way to show the "Top X" of this list, like the Top 10, Top 20, etc.?

    Thank you again for your help on this. If you can please clarify my followup questions, I would truly appreciate it.

  9. #9
    Registered User
    Join Date
    02-12-2013
    Location
    Connecticut
    MS-Off Ver
    Excel 2010
    Posts
    50

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Hi ChemistB,

    A quick update...I think I just realized why your I10 cell has "5" occurrences and not "4" as the number of orders that has this combination. In Order 19, there are two instances/rows of Black (one Shipped and one Unshipped) and I think your formula is double counting the Black (Shipped) with White as one combination and Black (Unshipped) with White as another combination, even though these are part of the same order and should only be counted once.

    One thought I had was to eliminate Duplicates of Colors that are part of the same order (since I don't care, for the frequency count, how many units are actually sold within an order, only that they are counted as part of the combination). The only problem is that I'm not sure how to eliminate Duplicates based on "duplicates on the same order". Maybe this can help with figuring out the solution.

    Again, thank you in advance for your help.

  10. #10
    Forum Guru
    Join Date
    08-05-2004
    Location
    NJ
    MS-Off Ver
    365
    Posts
    13,578

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Ahhh, it's ignoring shipped/unshipped so if you shipped a partial product in an order (so black appears as shipped and unshipped) it counts it twice. Probably best way to fix this is adding a column to the data itself.

    Meanwhile, I modified the equation in the table so it gives N/A's rather than duplicate black-white and white-black. it is

    Formula: copy to clipboard
    Please Login or Register  to view this content.


    I switched to the Large function instead of Max so we can get largest, 2nd largest, etc. However, just like MAX, if there's a tie, it doesn't know what to pick and returns an error. I used IFERROR to return "Multiple"
    IFERROR can only be used on 2007 and up.

    In G15 copied down (arrayed formula)

    Formula: copy to clipboard
    Please Login or Register  to view this content.

    In H15 copied down
    Formula: copy to clipboard
    Please Login or Register  to view this content.
    Attached Files Attached Files

  11. #11
    Registered User
    Join Date
    11-07-2008
    Location
    England
    Posts
    24

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Hi everyone.

    I need some help with a formula which is related to this thread.

    I need a formula which counts the unique number of orders when two items are bought together. The formulas in the thread above double count some orders if an order contains one of one item, and two of another item (hope that makes sense!).

    So in this example, item F & A are bought together in 4 separate transactions, and i need a formula to calculate that.

    Hope someone can help.

    Thanks

    Mike
    Attached Files Attached Files

  12. #12
    Forum Expert newdoverman's Avatar
    Join Date
    02-07-2013
    Location
    Port Dover, Ontario, Canada
    MS-Off Ver
    2010
    Posts
    10,330

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Unfortunately your post does not comply with Rule 2 of our Forum RULES. Do not post a question in the thread of another member -- start your own thread.

    If you feel an existing thread is particularly relevant to your need, provide a link to the other thread in your new thread.

    Old threads are often only monitored by the original participants. New threads not only open you up to all possible participants again, they typically get faster response, too.
    <---------If you like someone's answer, click the star to the left of one of their posts to give them a reputation point for that answer.
    Ron W

  13. #13
    Forum Expert Pepe Le Mokko's Avatar
    Join Date
    05-14-2009
    Location
    Belgium
    MS-Off Ver
    O365 v 2402
    Posts
    13,443

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Duplicate...

  14. #14
    Registered User
    Join Date
    11-07-2008
    Location
    England
    Posts
    24

    Re: Finding Most Frequently Occurring Combinations in Large Data Set

    Apologies, I will post a new thread

    Thanks

    Mike

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1