System Reliability Theory. Marvin Rausand
alt="Schematic illustration of the relationship between failure cause, failure mode, and failure effect."/>
Figure 3.6 Relationship between failure cause, failure mode, and failure effect.
3.6 Classification of Failures and Failure Modes
It is important to realize that a failure mode is a manifestation of the failure as seen from the outside, that is, the nonfulfillment of one or more functions. “Internal leakage” is thus a failure mode of a shutdown valve because the valve loses its required function to “close flow,” whereas wear of the valve seal represents a cause of failure and is hence not a failure mode of the valve.
Failures and failure modes may be classified according to many different criteria. We briefly mention some of these classifications.
3.6.1 Classification According to Local Consequence
Blache and Shrivastava (1994) classify failures according to the completeness of the failure.
1 Intermittent failure. Failure that results in the loss of a required function only for a very short period of time. The item reverts to its fully operational standard immediately after the failure.
2 Extended failure. Failure that results in the loss of a required function that will continue until some part of the item is replaced or repaired. An extended failure may be further classified as:Complete failure. Failure that causes complete loss of a required function.Partial failure. Failure that leads to a deviation from accepted item performance but do not cause a complete loss of the required function.Both the complete failures and the partial failures may be further classified as:Sudden failure. Failure that could not be forecast by prior testing or examination.Gradual failure. Failure that could be forecast by testing or examination. A gradual failure represents a gradual “drifting out” of the specified range of performance values. The recognition of a gradual failure requires comparison of actual item performance with a performance requirement, and may in some cases be a difficult task.Extended failures may be split into four categories; two of these are given specific names:Catastrophic failures. A failure that is both sudden and complete.Degraded failure. A failure that is both partial and gradual (such as the wear of the tires on a car).
The failure classification described above is shown in Figure 3.7, which is adapted from Blache and Shrivastava (1994).
Figure 3.7 Failure classification.
Source: Adapted from Blache and Shrivastava (1994)
.
3.6.2 Classification According to Cause
Failures may be classified according to their causes as follows.
Primary Failures
A primary failure, also called a random hardware failure in IEC 61508, occurs when the item is used in its intended operating context. In most cases, the primary failure results in an item fault and a repair action is usually necessary to return the item to a functioning state. Primary failures are generally random failures, where the cause of failure can be attributed to aging and the properties of the item itself. A primary failure is illustrated in Figure 3.8. Primary failures are the only category of failures that we justifiably can claim compensation for under warranty. Primary failures are not relevant for software.
Figure 3.8 A primary failure leading to an item fault.
Secondary Failures
A secondary failure, also called overstress or overload failure, is a failure caused by excessive stresses outside the intended operating context of the item. Typical stresses include shocks from thermal, mechanical, electrical, chemical, magnetic, or radioactive energy sources, or erroneous operating procedures. The stresses may be caused by neighboring items, the environment, or by users/system operators/plant personnel. Environmental stresses, such as lightning, earthquake, and falling object, are sometimes called threats to the item. We may, for example, say that lightning is a threat to a computer system and that heavy snowfall and storm are threats to an electric power grid. The overstress event leads to a secondary failure with some probability
A secondary failure usually leads to an item fault, and a repair action is usually necessary to return the item to a functioning state. The structure of a secondary failure is shown in Figure 3.9. Secondary failures are generally random events, but it is the overstress event that is the main contributor to the randomness.
Figure 3.9 A secondary failure, caused by an overstress event, leading to an item fault.
Systematic Failures
A systematic failure is a failure due to a systematic cause that may be attributed to a human error or misjudgment in the specification, design, manufacture, installation, operation, or maintenance of the item. A software bug is a typical example of a systematic fault. After the error is made, the systematic cause remains dormant and hidden in the item. Examples of systematic causes are given in Example 3.12.
A systematic failure occurs when a certain trigger or activation condition occurs. The trigger can be a transient event that activates the systematic cause, but can also be a long‐lasting state such as environmental conditions, as illustrated in Example 3.14. The trigger event is often a random event, but may also be deterministic.
A systematic failure can be reproduced by deliberately applying the same trigger. The term systematic means that the same failure will occur whenever the identified trigger or activation condition is present and for all identical copies of the item. A systematic cause can only be eliminated by a modification of the design or of the manufacturing process, operational procedures, or other relevant factors (IEC 61508 2010). A systematic fault leading to a systematic failure by the “help” of a trigger is shown in Figure 3.10. Systematic failures are often, but not always, random events, but it is the trigger that is random, whereas the item failure is a consequence of the trigger event.
Figure 3.10 A systematic fault leading to a systematic