System Reliability Theory. Marvin Rausand
Example 3.9 shows that a failure mode sometimes describes the “manner by which a failure occurs” and sometimes the “manner by which a fault is present.”
Example 3.9 (Electric doorbell)
A simple doorbell system is shown in Figure 3.4. The pushbutton activates a switch that closes a circuit from a battery to a solenoid that activates a clapper, which again makes sound by hammering on a bell. When your finger is lifted from the pushbutton, the switch should open, cut the circuit, and thereby stop the doorbell sound. The following failure modes may be defined:
Figure 3.4 Doorbell and associated circuitry.
1 No sound when the pushbutton is activated (by a finger.)
2 Doorbell sound does not stop when finger is lifted from pushbutton.
3 Doorbell sounds without activating the pushbutton.
A similar doorbell system is analyzed in NASA (2002).
3.5 Failure Causes and Effects
A failure mode is generally caused by one or more failure causes and may result in a failure effect, as shown in Figure 3.5.
Figure 3.5 Relation between failure causes, failure modes, and failure effects.
3.5.1 Failure Causes
All failures have at least one cause. We define failure cause as follows.
Definition 3.4 (Failure cause)
Set of circumstances that leads to failure.
The failure cause may originate during specification, design, manufacture, installation, operation, or maintenance of an item (IEV 192‐03‐11). The failure cause may be an action, an event, a condition, a factor, a state, or a process that is – at least partly – responsible for the occurrence of a failure. To be responsible for a failure, the cause must be present before the failure occurs, and the presence of the cause should increase the likelihood of the failure.
When studying several similar failures, we should see a positive correlation between the presence of the cause and the occurrence of the failure(s), but positive correlation is not a sufficient condition for claiming that something is a cause of a failure. It is very easy to find correlated factors that are totally unrelated. The correlation may, for example, be that the two factors are both caused by the same third factor. Causality is a complicated philosophical subject. A lot more information may be found by searching the Internet. The authors especially recommend consulting (Pearl 2009).
Several failure analysis techniques have been developed to identify the causes of a failure that has occurred. Among these are cause and effect analysis and root cause analysis that are described in Section 3.7.
3.5.2 Proximate Causes and Root Causes
The term root cause is often used in analyses of failures that have occurred. The term is defined in several standards, and each standard seems to have its own particular definition. Before giving our preferred definition, we define the term proximate cause, which is an immediately and (often) readily seen cause of a failure.
Definition 3.5 (Proximate cause)
An event that occurred, or a condition that existed immediately before the failure occurred, and, if eliminated or modified, would have prevented the failure.
A proximate cause is also known as a direct cause. A proximate cause is often not the real (or root) cause of a failure, as illustrated in Example 3.10.
A flashlight is part of the safety equipment in a plant. During an emergency, the flashlight is switched on, but does not give any light. A proximate (or direct) cause is that the battery is dead. If we have access to the flashlight and the battery after the emergency is over, it is straightforward to verify whether or not this was the true proximate cause.
Any battery will sooner or later go dead and if the flashlight is an essential safety equipment, it is part of the maintenance duties to test and, if necessary, replace batteries at regular intervals. “The battery has not been tested/replaced at prescribed intervals” is therefore a cause of the proximate cause. By asking “why?” this happened several times, we may get to the root cause of the failure.
For the purpose of this book, we define a root cause as:
Definition 3.6 (Root cause)
One of multiple factors (events, conditions, or organizational factors) that contributed to or created the proximate cause and subsequent failure and, if eliminated, or modified would have prevented the failure.
For some failure modes, it may be possible to identify a single root cause, but most failure modes will have several contributing causes. All too often, failures are attributed to a proximate cause, such as human error or technical failure. These are often merely symptoms, and not the root causes of the failure. Very often, the root causes turn out to be much more, such as (i) process or program deficiencies, (ii) system or organization deficiencies, (iii) inadequate or ambiguous work instructions, and/or (iv) inadequate training.
To identify root causes of failures and to rectify these is important for any system in the operational phase. It does not help only to correct the proximate causes (such as to replace the battery of the flashlight in Example 3.10) when a failure has occurred. This way, the same failure may recur many times. If, on the other hand, the root cause is rectified, the failure may never recur. Root cause analysis is briefly discussed in Section 3.7.
3.5.3 Hierarchy of Causes
The functions of a system may usually be split into subfunctions. Failure modes at one level in the hierarchy may be caused by failure modes on the next lower level. It is important to link failure modes on lower levels to the main top level responses, in order to provide traceability to the essential system responses as the functional structure is refined. This is shown in Figure 3.6 for a hardware structure breakdown. Figure 3.6 is further discussed in Section 3.6.5.