Design for Excellence in Electronics Manufacturing. Cheryl Tulkoff
knowledge cannot be obtained at a reasonable cost in a reasonable amount of time. In other words, when trying to determine a course of action, the best path is to acquire knowledge. Do not rely primarily on predictive statistics and probabilities. Use these tools only as a last‐resort strategy or in conjunction with other tools. Don't gamble with product reliability. Much like the stock market, past performance does not guarantee future results. Since excellent resource material on reliability statistics exists (Abernethy 2000; Wunderle and Michel 2006, 2007), this section intends to reintroduce and provide a brief refresher of some key concepts, basic recommendations, and common pitfalls.
Reliability statistics are used to describe samples of populations. If the sample size is small, every member can be tested, and representative statistics are not needed. For reliability statistics to be valid, samples must be chosen randomly, and every member must have an equal chance of being selected. For example, when testing to failure, if testing is ended before all items have failed, the sample is not random. The data is referred to as censored data. Censored data may use Type I censoring (time censored) or Type II censoring (failure censored). Censored data must be analyzed using special techniques.
Here are some views on statistics from masters of their fields:
A statistical relationship, however strong and however suggestive, can never establish a causal connection. Our ideas on causation must come from outside statistics, ultimately from some theory.
Kendall and Stuart, The Advanced Theory of Statistics
Ultimately, all our understanding should be based upon real knowledge (scientific, human, etc.). The statistical methods can provide tools to help us gain this knowledge.
Patrick O'Connor, Practical Reliability Engineering
When you can measure what you are speaking about and express it in numbers, you know something about it.
But when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.
It may be the beginning of knowledge, but you have scarcely in your thoughts, advanced it to the state of science, whatever the matter may be.
Lecture to the Institution of Civil Engineers, 3 May 1883, William Thomson – Lord Kelvin
Common statistical pitfalls include sample‐size errors, distribution or model errors, and failure‐to‐validate‐data errors. Using sample sizes too small to be statistically significant or valid (and using that data to make long‐term decisions!) is a common error in high‐reliability systems design due to availability, time, complexity, and cost of hardware and testing. Distribution or model errors are also frequently seen. Using normal distributions for non‐normal data is often done due to the ease of calculations and over‐reliance on easy spreadsheet statistics. Using limited, scrubbed, or unvalidated data is also common. Repeating analyses and modeling at various intervals are highly recommended as more and more credible data is obtained. For example, consider repeating the analysis after product release, after a specific number of product builds, after a certain volume has been manufactured, and after some time in the field.
2.5.1 Reliability Probability in Electronics
Any event has a probability of occurrence between 0 and 100%, or zero to one. A probability of zero means the event will never occur, while a probability of one means the event is guaranteed to occur. Calculating probabilities requires that the data be unbiased, random, come from a representative sample, and use a valid sample size.
Probability event types may be simple or compound. Simple events cannot be broken down, while compound events are composed of two or more events. Compound events can be additive or multiplicative.
Additive events are composed of independent events. Example: Two cars starting on a cold day, where each has a probability of starting of 0.90. What is the probability that at least one will start?– Probability = P(1) + P(2) P(12) = 0.90 + 0.90 (0.90 0.90) = (1.80 .89) =.91
Multiplicative events are composed of dependent events One IC in a circuit has a probability of working of 0.99; another IC in a circuit has a probability of working of 0.90. What is the probability that the circuit will work?– Probability = P(1) P(2) = 0.99 0.90 = 0.89
2.5.2 Reliability Statistics in Electronics
First, identify what type of characterization is needed for the application. Questions to consider include:
Is a rate being modeled?
Is there a specific number of trials?
Is the probability of success the same for all trials?
Are there discrete functions requiring analysis: pass/fail, working/non‐working, on/off?
Are there continuous functions requiring analysis: controlled by a continuous variable like time?
Are there point functions requiring analysis: repairable systems where more than one failure or type may occur over time?
Frequently used reliability statistics for electronics include:
Failure rate = failure per unit of time for a population. Examples:– 4 failures (population) per million operating hours = 4 hours.– 1000 items operating for a year before failure = 1000 hours 365 days operating hours.– Note: In some texts, the term failure rate is reserved for repairable systems (more than one failure per device is possible) and the term hazard rate (H) is used for non‐repair able systems (only one failure per device is possible). In practice, the terms are used interchangeably, with the term failure rate commonly used in the US.
Mean time between failure (MTBF). Example:– MTBF = 1 / failure rate– Using the failure rate, the MTBF is 1 4 hours = 250,000 hours.
Percent or probability of survival or failure at a point along a timeline. Examples:– Reliability = 99.7% (0.9970) at 1 year, 98.9% at 3 years (warranty), 96.5% at 10 years (design life).– Failure = 1 − reliability or . So, failure is equal to 100 99.7 = 0.3% (.0030) at 1 year. 100 98.9 = 1.1% at 3 years (warranty). 100 96.5 = 3.5% at 10 years (design life).– 3 failed out of 1000, 11 failed out of 1000, 36 failed out of 1000.
The FIT (failure in time) rate is defined as the expected number of component failures per billion (10) hours:– The FIT rate can be easily converted to the MTBF in hours where: MTBF = 10 hours/FIT.– The annualized failure rate (AFR) is calculated as AFR = (FIT 8760)/10 hours.– The advantage of using FIT rates rather than MTBF is that FIT rates are additive. For example, if the FIT rate of Part A is 125, and the FIT rate of all other component failures is 75, the FIT rate of the system is 125 + 75 = 200.– A FIT rate of 200 corresponds to an MTBF of (10 hours / 200) = 5 hours. (5 hours is a very low failure rate.)– A FIT rate of 200 also corresponds to an AFR of (200 hours 8760)/10 hours = 0.00175 = 0.175%.
DPPM = Defective parts per million– Example:– 20 pieces are defective in a lot of 1000 pieces. The DPPM is equal to (20 / 1000) =.02 or 2.0% defect ive. 0.020 1,000,000 = 20,000 DPPM.
2.5.2.1 Basic Statistics Assumptions and Caveats
Statistics are used to describe samples of populations. If the sample size is small, every member of the population can be tested, and statistics are not needed. Consider ahead of time whether the tool or technique being applied to a data set is appropriate. Many statistical tools are valid only if the data is random or independent. In most cases, samples must be chosen randomly – every member must have an equal chance of being selected. When testing to failure, if the testing is ended before all items have failed, the sample is not random. In this case, the data is censored. Type I censoring is time‐based. Type II censoring is failure‐based. Censored data can be analyzed using special techniques. Testing of a small lot of parts is typically not effective for detecting issues occurring at a rate below 5% of