Risk Assessment. Marvin Rausand
events. Examples of active failures are errors and violations by field operators, pilots, and control room operators.3 These are the people in the operation – what Reason calls the sharp end of the system. Latent conditions do not trigger an accident immediately, but they lie dormant in the system and may contribute to a future accident. Examples of latent conditions are poor design, maintenance failures, poor and impossible procedures, and so on. Latent conditions can increase the probability of active failures.
There are clear similarities in the way that Reason uses these terms and our way of using enabling events and conditions.
2.3.8 Technical Failures and Faults
Failures and malfunctions of technical items may be relevant as both hazards and enabling events. A failure is defined as follows:
Definition 2.12 (Failure of an item)
The termination of the ability of an item to perform as required.
A failure is always linked to an item function and occurs when the item is no longer able to perform the function according to the specified performance criteria. Failure is an event that takes place at a certain time
The occurrence of some failures can be observed immediately when they occur, and these failures are called evident failures. For other failures, it is not possible to observe the failure without testing the item. These failures are called hidden failures. Hidden failures are a particular problem for many safety systems, such as fire or gas detection systems, and airbag systems in cars.
After a failure, the item enters a failed state or a fault and remains in this state for a shorter or longer time. Many failures require a repair action to be brought back to a functioning state. Some items – especially software items – may spend a negligible time in failed state.
A fault of a technical item is defined as follows:
Definition 2.13 (Fault of an item)
A state of an item, where the item is not able to perform as required.
Many faults are caused by a preceding failure, but there is also another important category of faults – systematic faults. A systematic fault is caused by a human error or a misjudgment made in an earlier stage of the item's life cycle, such as specification, design, manufacture, installation, or maintenance. A systematic fault remains in – or is related to – an item until the fault is detected as part of an inspection or test, or when the systematic fault generates an item failure. Systematic faults are important causes of safety system failures and include faults, such as, software bugs, calibration errors of detectors, erroneously installed detectors, too low capacity of fire‐fighting systems, and so forth.
Remark 2.4 (Analogy to death and being dead)
If we compare a human being and a technical item, the terms “death” and “failure” are similar terms. In most cases, we can record the time of death of a person, and we can calculate the frequency of deaths in a certain population. When a person dies, she enters the state of being dead, and remains in this state. As for technical items, it is not possible to calculate any frequency of being dead. The main difference between the terms is that technical components often can be repaired and continue to function, whereas a dead person cannot.
Example 2.5 (Pump failure)
Consider a pump that is installed to supply water to a process. To function as required, the pump must supply water between 60 and 65 l/min. If the output from the pump deviates from this interval, the required function is terminated and a failure occurs. The failure will often occur due to a gradual degradation, as shown in Figure 2.4 .
A failure may occur in many different ways, and these are referred to as failure modes.
Definition 2.14 (Failure mode)
The manner in which a failure occurs, independent of the cause of the failure.
Example 2.6 (Pump failure modes)
Reconsider the pump in Example 2.5 . The following failure modes may occur:
No output (the pump does not supply any water)
Too low output (i.e. the output is less than 60 l/min)
Too high output (i.e. the output is more than 65 l/min)
Pump does not start when required
Pump does not stop when required
Pump starts when not required…more failure modes depending on other functional requirements: for example, related to power consumption or noise.
Failure mode is a very important concept in risk and reliability analyses and is further discussed in Section 10.5 .
Technical failures do not occur without a failure cause, defined as:
Definition 2.15 (Failure cause)
Set of circumstances that leads to failure (IEV 192‐03‐11).
A failure cause may originate during specification, design, manufacturing, installation, operation, or maintenance of an item.
Some of the possible failure causes are classified as failure mechanisms and are defined as follows:
Definition 2.16 (Failure mechanism)
Physical, chemical, or other process that leads to failure.
The pump in Example 2.5 may, for example, fail due to the failure mechanisms corrosion, erosion, and/or fatigue. Failure may also occur due to causes that are not failure mechanisms. Among such causes are operational errors, inadequate maintenance, overloading, and so on.
2.3.8.1 Failure Classification
Failures of an item can be classified in several ways. Here, we suffice by mentioning one classification. The classification is related to a specified function of the item and not the hardware as such. To illustrate the different types of failure, we may consider the function “wash clothes” of a washing machine.
Primary failure. These failures occur in the normal operating context of the item and are typically hardware failures caused by some deterioration, such as wear. Primary failures are random failures where the probability distribution is determined by the properties of the item. Primary failures are in some applications called random hardware failures.
Secondary failure. These failures are also called overload failures. A secondary failure of a washing machine may, for example, be caused by a lightning strike or a far too heavy load. Secondary failures are