+ Reply to Thread
Results 1 to 2 of 2

Dealing with large data

  1. #1
    Forum Expert dflak's Avatar
    Join Date
    11-24-2015
    Location
    North Carolina
    MS-Off Ver
    365
    Posts
    7,920

    Dealing with large data

    This is a spin-off on the large spreadsheet thread - I don't want to derail it too much.

    One of my issues with one of my customers it that they often demand too much data. We have a report that tracks individual models (of which there are hundreds) and individual failure codes (of which there are hundreds - which is way too many IMHO). So that's tens of thousands of data points every day. What's more is that they want to see every day for a month and be able to "expand" the pivot table on any month for the past two years. So there's about 3.5 million pieces of data tracked on this one report. That's one huge haystack in which to be looking for needles.

    Also the data are very variable. We might process 100 of a particular model one day, 300 the next, 50 the day following and then none the day after that. Weekly averages allow regression towards the norm. Which brings up the question: how much data is enough? How much is too much? How granular do you need to be?

    We do try to use run charts with upper and lower control limits but they depend on a normal distribution which we don't have. We have growth (more people are buying the product) and seasonality - I can compensate for both of these, but the stuff coming down the pipe is "clumpy" - Stores wait until the box gets filled and then ship it to our shop. Also the inspection lines change constantly - the inspectors set up to test a particular model and they will test it all day long or for several days and then move onto another model until there's enough backlog to switch back to the first model.

    Also models have a life cycle: they are introduced, everyone is happy for a couple of months. More models get bought and then they start coming back: first a trickle then a surge and then tapering off to the point where the really old timers come straggling in one or two a month. How far do I go back under these circumstances to compute the control limits? Too far back and they are static and do not reflect the current situation. Not far enough back and they are ephemeral.

    In other words, it's an out of control system. But nonetheless, the run charts do occasionally show something drastic - and there is usually a good reason for it.

    One of the metrics our user is fond of is period over period change rate such as (This Month - Last Month) / Last Month. So if you had 1 last month and have 2 this month, that's a 100% increase whereas if you had 100,000 last month and 150,000 this month, that's only a 50% increase. To compensate for this I have a sliding "sigma" on my charts. Low volume models get wide control margins. High volume models get narrower margins.

    I push hard for a parateo analysis. Don't worry about the small stuff - concentrate on volume. Still some people want the details on everything.
    One spreadsheet to rule them all. One spreadsheet to find them. One spreadsheet to bring them all and at corporate, bind them.

    A picture is worth a thousand words, but a sample spreadsheet is more likely to be worked on.

  2. #2
    Forum Expert
    Join Date
    04-01-2013
    Location
    East Auckland
    MS-Off Ver
    Excel 365
    Posts
    1,343

    Re: Dealing with large data

    Some slightly disorganized thoughts,

    In some situations we provide (because it is asked for) daily and hourly data, but the end users are not experts in statistics, and I think it inspires people to spend time watching the numbers and trying to react to statistically insignificant changes/correlations.

    Also I find that with many users of the same report, there is a natural tendency for then to each ask for all the data they think they might need just in case. Once those items are included - generally no-one wants to reduce the amount of data, even if it starts to make things harder to use, because they don't know who might need that data and they don't want to be responsible for removing it. This means complexity grows.

    Pareto analysis has some issues from the user perspective in that a lot of it is about risk avoidance. Doing the 80% well doesn't completely cover you when you get nailed for doing 20% badly - when you get called up the 80% will probably be assumed to be business as usual.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Dealing with large data sets in excel
    By stephme55 in forum Excel General
    Replies: 7
    Last Post: 02-01-2016, 04:30 PM
  2. Replies: 2
    Last Post: 01-08-2013, 10:50 AM
  3. Dealing with extremely large numbers in Excel
    By NexTerren in forum Excel General
    Replies: 1
    Last Post: 11-14-2012, 04:10 PM
  4. Replies: 1
    Last Post: 05-10-2012, 10:42 AM
  5. Dealing with large lists and assigning values
    By Dnoonan17 in forum Excel Programming / VBA / Macros
    Replies: 13
    Last Post: 02-03-2012, 05:02 AM
  6. Tips for dealing with large data sets?
    By DyslexicHobo in forum Excel General
    Replies: 3
    Last Post: 03-02-2010, 02:06 PM
  7. [SOLVED] Dealing with a large table and multiple calculations
    By scrabtree23 in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 01-08-2005, 11:06 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1