If you just arrived here, click on "Home" above for more on Process Management

Question: When is Six Sigma not Six Sigma?

Answer: When it's the Six Sigma Metric!!^©

Arthur M. Schneiderman

Six Sigma (6s) Quality is a popular approach to process improvement, particularly among technology driven companies such as Allied Signal, General Electric, Kodak and Texas Instruments. Its objective is to reduce output variability through process improvement, and/or to increase customer specification limits through design for producibility (Df_P), so that these specification limits lie at more than ±"six" standard deviations, or s's, from the process mean (I'll explain the quotation marks later). In this way, defect levels should be below 3.4 "defects per million opportunities" for a defect, or "dpmo" for short.

Although originally introduced by Motorola in 1986 as a quality performance measurement, 6s has evolved into a statistically oriented approach to process improvement. It is deployed throughout an organization using an army of champions and experts called "black belts," a title borrowed from their martial arts counterparts. They command a rank-and-file made up of teams focusing on the improvement of the organization's processes. Just search the internet for "six sigma" and you'll come up with several informative descriptions of its history and current practice. The Six Sigma Academy, a Motorola spin-off, provides consulting service to many of the leading practitioners of this approach. What I want to focus on here though, is the 6s metric itself, not the concept or the approach.

I don't like the 6s metric. As you'll see, it fails to pass many of the tests that I've previously established for "good" metrics and described in Part 1 of Metrics for the Order Fulfillment Process. In particular, it's neither simple to understand nor, in most applications, an effective proxy for customer satisfaction. It does not have an optimum value of zero. And, its definition is ambiguous and therefore easily gamed because there is no accepted test for what to include as an "opportunity" for a defect.

What is an "opportunity"?

I've trained improvement teams, team leaders, and black belts for one of the aforementioned companies in their 6s metrics module. Once they get through the distinction between defects vs. defectives and attribute vs. variable data the greatest difficulty that the trainees encounter is in determining what constitutes an opportunity for a defect. Obviously, by increasing the number of opportunities (the denominator of dpmo), you can improve the metric, particularly if you include opportunities that are not important to customers and consequently are not routinely checked for conformance, thereby allowing their defects to go uncounted.

This weakness can be overcome (but seldom is in practice) by applying an objective weighting for defect severity in counting both opportunities and actual defects. For example, critical defects, ones that make the output unusable by the customer, get a weighting of one while inconsequential defects get a weighting of zero. Cosmetic defects or ones that can be corrected or compensated for have values in between, depending on the relative cost of correction or their likely impact on the customer's repurchase decision. A similar approach is taken in Failure Mode and Effect Analysis (FMEA) where improvement priorities are set based on a combination of frequency of occurrence, severity and detectability of candidate failure modes. I understand that the TI flavor of 6s does include this type of logic. Where should the weightings come from? The customer of the process, of course (but, more about this in a future installment in this series, if there's sufficient interest). Current practice usually leaves the choice of what constitutes an opportunity for a defect as a subjective, not objective decision. This has proven to be a poor standard for good metrics.

Is it really "six" s?

Let's return to the metric itself. Once we've identified all of the appropriate opportunities for defects and counted the actual number of them that fail to meet specification, we're ready to calculate the metric. It's trivial to determine the dpmo value, but what is the corresponding sigma value? First, you'll have to find a table of values for the "one-sided tail of a normal distribution." That should be easy, right?

Well, they're not that easy to find. Most textbooks or statistics tables end at values of three or four sigma. Why? My guess is that up until recently there was little need for knowing values above these levels. Practical applications simply did not exist in our world. There's probably a profound message for us there, if we look carefully. I've found such a table though in the 1992 Motorola Publication "Six Sigma Producibility Analysis and Process Characterization" by Mikel J. Harry and J. Ronald Lawson. Other more recent 6s sources always seem to reference this one. Its Appendix C gives a value of 1.248x10^-9 for 6s.

But wait, what happened to the 3.4x10^-6? Forgive my cynicism, but here comes what looks to me like a little "slight-of-hand." We are told that there is a typical 1.5s long-term drift in most process means. To adjust for it, we need to subtract out this 1.5s, so that we actually use the table entry at 4.5s to get to the adjusted short term value: that's 3.451x10^-6. In other words, if we measure 3451 defects in a billion opportunities, only one of them was caused by short-term process variability. The other 3450 were caused by this mysterious long-term drift in the mean, so we're not going to count them. We'll report that our process is operating at 6s. Got it? To be honest though, in small print we will admit to the 1.5s adjustment, whether it's justifiable or not. To make it easier for us, tables are provided that incorporate this adjustment, with the obligatory footnote.

Well, I am aware of situations where there is a drift in the mean, caused for example by tool wear or component aging, but I also know of processes in which this phenomenon simply does not occur. And, why forgive this long term drift anyway, even when it does exist. Laser machining eliminates tool wear; compensation circuits can adjust for component aging, and there's a whole science of adaptive feedback systems that can sense and compensate for various forms of both deterministic (like tool wear) as well as random "non-stationarity," as the statisticians like to call this drift. In a previous work-life, I spent many an evening atop beautiful Mt. Haleakala in Hawaii peering through a large telescope at satellites streaking across the sky. It was guided by a computerized tracking system that effectively compensated for significant random image wander created by the intervening atmospheric turbulence. So I know first hand that it can be done.

Furthermore, there is a conceptual problem created by the assumption that there is a constant relationship between long term drift in the mean and short term process variation. It implies that they both have a common root cause. I can think of no theoretical reason why that should be true in any given case, let alone be true in general. If instead it's based on empirical observation, than I'd like to see the supporting data so I can draw my own conclusion as to its general validity. It seems to me that this largely undocumented long term drift in the mean is as worthy a target for process improvement as is reduction in short-term variation. And I don't buy the argument that it's too complicated in general to analyze, so we'll just use a universal approximation. Too much very valuable information is buried by that concession, not to mention the undesirable behavior that it all too often encourages.

My cynical symbiont would have loved to have been a fly-on-the-wall, when this convenient "discovery" was made. Why convenient? Well, think about it. If each unit produced has 100 opportunities for independent defects, then without this 1.5s adjustment 6s quality would mean that you would have only one defective unit in 10 million output units produced! Banks would never make an error in processing loan applications, semiconductor manufacturers would produce many products that never have even a single defect throughout the product's entire lifecycle, and call centers would correctly transfer each and every call the first time and maintain this perfect performance over many decades. For nearly all processes, that would be indistinguishable from the already un-sellable concept of zero defects as a reasonable achievable goal.

Is 6s a good goal for ALL processes?

So I for one don't buy this 1.5s "free bonus" even in cases where it may exist. But there are other critical problems with the 6s goal. I've argued repeatedly that each metric has a limiting value determined by the process's enabling technology and organizational structure. Absent process re-design, nothing can be done to reduce the sigma level below this limiting value or entitlement on a permanent basis. Individual heroics can create short-term gains beyond this limit (as evidenced by the well-known Hawthorne Effect), but they are not sustainable in the long-term.

The goal of 6s for all processes requires an organizational commitment to continuously re-design every one of them before their limit is approached. Not only must the financial commitment be there, but also the required new enabling technology and organizational flexibility. In many situations, these commitments are unrealistic, unreasonable and/or unsound. My personal bias is to focus on metrics that address the gap between current and potential performance and focus on the rate at which that gap is closing (see my publications on the half-life method, for example).

Consider also an old saying that we have in the System Dynamics world: "things get worse before they get better." Its origins lie in the observation that major changes usually create short-term disruptions that adversely affect current performance. Process redesign almost always displays this dynamic. If you are being rewarded on your 6s performance, past experience will discourage you from self-initiating a process re-design since there is a good chance that it will initially blow your 6s performance. Short term special dispensation from the 6s goal may be a prerequisite for justifiable process redesign.

Furthermore, increasing technical and organizational complexity slow the rate of process improvement. Combine this with the observation that complex processes tend to have long cycle times compared to the time it takes for unpredictable changes to occur in their environment and you're quickly led to the conclusion that many important processes can never achieve 6s performance unless they are dysfunctionally over-simplified. This is how chaos theory enters the picture. My view is that only routine, mature, and very high unit volume processes should even be considered as potential candidates to have 6s as a goal.

Set a goal of 6s to drive desired changes in the wrong processes and you will only stifle innovation and encourage conservativism and sub-optimization. Innovation and uncertainty are inexorable partners. I've seen new product development efforts seriously undermined as a result this type of phenomenon. Instead, if you must, set a process goal of xs, where x is dependant on process complexity and maturity. I would speculate that x=3 might be closer to the right number for many important processes.

Another related perspective on this issue is in terms of process learning. As a process approaches its limiting performance, learning declines in absolute terms. An organization which has achieved 6s in all of its processes is an organization that has, in this sense, stopped learning. In all cases that I can think of, when you stop learning, you stop competing and we all know where that feedback loop leads.

What is the real effect on the bottom line?

Six Sigma Quality is often touted on the basis of its significant bottom line impact. Some claim more than $1M per year per Black Belt in typical cost savings. For example, according to one Motorola Six Sigma Presentation, in 1996 they achieved 5.6s performance (up from 4.2s in 1986), $16B in cumulative manufacturing cost savings and a reduction in Cost of Poor Quality from 15% in 1986 to a little over 5% of sales in 1996. I'm not sure where that number comes from nor where the billions of dollars in resulting claimed savings went, but I'd really like to see an independent audit so that I could understand the basic assumptions used.

I would hope that the calculated savings net out the component of traditional cost reduction, as captured, for example, by the historical cost experience curve, so that the resulting number is truly reflective of the incremental savings that are directly assignable to the 6s initiatives. It is always very tempting to attribute all benefits to the current program, regardless of their true origins.

All too often, these "cost savings" estimates fail to recognize that many apparently variable costs are in fact fixed or semi-fixed. They don't really go away, but instead move elsewhere in the organization, at least for the short term. Another common practice is the inclusion of profit from new revenue which will be generated by the resources (people, equipment and facilities, for example) freed-up by the process improvement. Unfortunately, these estimates seldom consider total market potential or competitive dynamics. Furthermore, there is rarely a closing of the loop to assure that the predicted savings were actually achieved. I've heard more than one improvement team query their sponsor with: "What level of savings are you looking for?" Not surprisingly the chosen assumptions yield that desired answer.

I would not be surprised at all to find that Darwinian rules develop over time for the calculation of sigma levels in many organizations in order to assure survival of only the fittest opportunities for inclusion. I've been told of more than one case where a persistent defect has been dropped from the calculation with the justification that "we can't be measured on what we don't control." Try selling that argument to the customer.

What is also perplexing is that over the last five years Motorola's stock has not outperformed the aggregate Electronic Equipment Industry of which it is a member. One senior quality executive at Motorola told me that the bulk of the 6s savings had to be passed on to customers in the form of price reductions, so they do not appear on the bottom line. These two observations suggest that Motorola's competitors have realized similar performance improvements, with or without the benefit of the six sigma approach.

Also keep in mind that cost reduction by itself does not create significant societal wealth. Its principal effect is to move wealth from one place to another. The improvement in labor productivity only benefits society if there are value creating alternatives available for the surplused capital and labor. Reduce equipment and raw materials usage and you reduce the wealth of the equipment and raw materials suppliers. Societal wealth is mostly created on the revenue side of the equation; by the creation of new outputs that are of value to people. But 6s is of little use there. Just try applying it to processes having a significant amount of creative content like product development or R&D.

So my bottom line is that the claimed financial benefits of improvement in the 6s metric, are also unsubstantiated. This undermines the assertion of its proponents that the results prove that the metric really works. The true benefits of 6s are shared in common with the other flavors of TQM.

The hidden danger of the 6s metric.

Why is all of this important? You could argue that I'm nitpicking and that the real value of 6s is in the concept and approach, not the actual metric. But, non-financial performance measures are increasingly becoming an important consideration in individual's compensation and promotion. Past performance along these dimensions even enters resource allocation decisions. This arises from the over-riding objective of metrics: to drive positive changes in individual and group behavior. But, if the non-financial measures are inherently unsound, so too will be the decisions to which they contribute. In my view, the 6s metric falls into this category of noise generating metrics.

The 6s metric does have some redeeming characteristics though:

	It is defect oriented.
	With the exception of identification of opportunities for a defect, it is reasonably well documented.

However, it has an overwhelming number of weaknesses as a metric. Let me summarize them:

	Unless the opportunities are weighted by importance to the customer, it can be a poor surrogate for customer satisfaction because the metric can get better while customer satisfaction gets worse. How? By improvement of one type of defect at the numerical expense of a more important one (e.g. eliminate 10 unimportant defects while creating only 5 more important ones: net result, an apparent improvement of 5, with an obvious reduction in customer satisfaction). Note though that this refinement adversely affects the metric's simplicity requirement.
	Anyone who has taught the 6s metric can testify to its complexity, even when the students are soon-to-be Black Belts. This complexity also violates the KISS principle of good metrics.
	The 1.5s adjustment is unsupported and clearly is case dependant at best, thus making the metric inherently biased (it systematically overstates actual performance).
	Because of its ambiguity, it is easily gamed unless complimented by other, more valuable metrics. As a test, give two groups of knowledgeable people the independent job of identifying the opportunities for defects. It is likely that their lists will look very different. Although it is often touted as a universal metric that allows cross-process comparisons, this weakness significantly undermines that potential.
	Although it looks like variable data, it is based on attribute data (number of defects) which masks the degree to which the individual specifications fail to meet customer requirements. This breaks the link of the metric to its underlying root causes, unless the associated variable data is also measured and reviewed.
	It is based on the gap between current performance and zero defects rather than the process's limiting value. In doing this it fails to accommodate strategic decisions about process re-design priorities.
	As a goal, it fails to differentiate between processes of different complexity and maturity. If fails to recognize the role of chaos or exogenous unpredictability in some very important processes, for example forecasting, product development, resource allocation and strategic planning.

Bottom line: as far as the 6s metric is concerned, forget it. Calculate defects and defect rates, along with their underlying variable data, but don't bother trying to convert them to an arbitrary sigma value.

Even the tables are wrong

I can't leave this subject without sharing with you a little twist of irony for this metric.

The tail of the normal distribution can not be evaluated in what mathematicians call "closed-form." That means that you can't write an equation where you plug in dpmo and out comes the s value. That's why we need to have those tables. And they can only be evaluated using numerical integration techniques or finite series approximations. When doing this, mathematicians know that it's important to estimate the residual error so that you know the accuracy of your estimate. But this is not always done.

Under the circumstances, it is understandable that Motorola's numbers, for large s values are not exactly correct: precise, yes, accurate, no. Today, the wonders of modern personal computers make these calculation accessible to everyone, including me*. For example the correct 6s value is actually 0.987x10^-9, not 1.248x10^-9. That's a mere 26% error! Although the error decreases with decreasing s, the correct value at 4.5s is actually 3.397x10^-6, not 3.451x10^-6 as published in the Motorola table. That error is 1.59% which translates into 2.15s (now should I add that 1.5s or not?)! Yes, I know that there is no practical difference between these two values. But remember, my point is irony, not significance. The quoted 10s value of 6.216x10^-21 is in fact 7.62x10^-24, or about 1000 times smaller!! I certainly hope nobody's career depended on the accuracy of that one.

Conclusion

In closing, don't get me wrong, I'm not saying that numerical goals, variation reduction or Df_X (aka Design for 6s), where X stands for the "abilities": producability, testability, maintainability, serviceability, recyleability, etc., are unimportant. I have always been a big fan of Armand Feigenbaum, who described most of the 6s statistical concepts in the 1950's. His classic book, Total Quality Control became the bible and inspiration for the Japanese quality movement and the source for the name TQC. What I am saying is that 6s is a poor metric. So my advice: use Six Sigma as the name for your version of TQM, but don't track its numerical value or put it on your balanced scorecard.

* If you're interested in how I determined the correct numbers, send me an e-mail. I'd be glad to send you the formula I derived for large sigma and how I checked the results.

return to top

Last modified: August 13, 2006