The “Grouping Error”
Observations can be turned into data by measurement. Measurements can be summarized by statistical analysis, and then decision-making ideas start to emerge from the numbers.
But wait! there is this thing called “the grouping error”.
Here is our classic “classroom” story illustrating this problem:
Ten machines are “bubble-packing a consumer product and a statistical summary concludes that about 1% of the packages are mangled, damaged – crushed in the packaging machines. It’s appears to be an alignment failure of conveyor, product and heat-sealing stamp/press.
This idea begins to emerge. What do these ten machines have in common that causes an occasional alignment failure? Is it a Timing mechanism? Are there plastic parts that should be replaced with steel? Do we need to rebuild/replace these machines with precision stepping motor components? (Don’t raid the capital expenditure budget yet!)
Here is how the 1% came to be: ONE machine had a 10% scrap rate and the remaining nine had little or none. A DIFFERENT IDEA emerges from the numbers: what is different about machine number ten?!!
I have seen this exact issue in more than one industry/process, and of course there are ways to be vigilant and catch this mistake before the final roll-up of data into a final report.
A data analyst might know that a histogram can show multiple peaks (“bimodal”) indicating that a single average does not describe the population. A statistician might look at data clusters or perform a F-test or test for a goodness of fit to a normal distribution. Any of these checks and more could be employed to examine suspicious data for a grouping error.
However, there are the facts we know from simply observing the thing we measure. CNC machining data should probably not be merged too early in the analysis with “old school” machining technology or additive manufacturing. Defective/damaged products manufactured from wood should be studied apart from same products made from metal . Call center calls with translators should not be prematurely grouped with calls handled by native speakers.
Working with data does NOT mean shutting out every other fact and observation available to us, and this other information guides us as we extract the right conclusions from the data we collect.