This is a spin-off on the large spreadsheet thread - I don't want to derail it too much.
One of my issues with one of my customers it that they often demand too much data. We have a report that tracks individual models (of which there are hundreds) and individual failure codes (of which there are hundreds - which is way too many IMHO). So that's tens of thousands of data points every day. What's more is that they want to see every day for a month and be able to "expand" the pivot table on any month for the past two years. So there's about 3.5 million pieces of data tracked on this one report. That's one huge haystack in which to be looking for needles.
Also the data are very variable. We might process 100 of a particular model one day, 300 the next, 50 the day following and then none the day after that. Weekly averages allow regression towards the norm. Which brings up the question: how much data is enough? How much is too much? How granular do you need to be?
We do try to use run charts with upper and lower control limits but they depend on a normal distribution which we don't have. We have growth (more people are buying the product) and seasonality - I can compensate for both of these, but the stuff coming down the pipe is "clumpy" - Stores wait until the box gets filled and then ship it to our shop. Also the inspection lines change constantly - the inspectors set up to test a particular model and they will test it all day long or for several days and then move onto another model until there's enough backlog to switch back to the first model.
Also models have a life cycle: they are introduced, everyone is happy for a couple of months. More models get bought and then they start coming back: first a trickle then a surge and then tapering off to the point where the really old timers come straggling in one or two a month. How far do I go back under these circumstances to compute the control limits? Too far back and they are static and do not reflect the current situation. Not far enough back and they are ephemeral.
In other words, it's an out of control system. But nonetheless, the run charts do occasionally show something drastic - and there is usually a good reason for it.
One of the metrics our user is fond of is period over period change rate such as (This Month - Last Month) / Last Month. So if you had 1 last month and have 2 this month, that's a 100% increase whereas if you had 100,000 last month and 150,000 this month, that's only a 50% increase. To compensate for this I have a sliding "sigma" on my charts. Low volume models get wide control margins. High volume models get narrower margins.
I push hard for a parateo analysis. Don't worry about the small stuff - concentrate on volume. Still some people want the details on everything.
Bookmarks