Root Cause Failure Analysis. Trinath Sahoo
Review
It is essential to clearly understand the design parameters and specifications of the systems associated with an event or equipment failure. Unless the investigator understands precisely what the machine or production system was designed to do and its inherent limitations, it is impossible to isolate the root cause of a problem or event. The data obtained from a design review provide a baseline or reference, which is needed to fully investigate and resolve plant problems.
The objective of the design review is to determine whether the machine is running within acceptable operating envelope. The condition of the machine and the process condition are being investigated. For example, a centrifugal pump may be designed to deliver 1OOO m3/h of water having a discharge Pressure of 20 kg/cm2. If it is operated beyond this point, then the power will increase and due to running beyond design limit vibration may go up. The review should establish the acceptable operating envelope, or range, that the machine or system can tolerate without a measurable deviation from design performance. Evaluating variations in process parameters, such as pressures flow rate, and temperature, is an effective means of confirming their impact on the production system.
Operating and Maintenance Manuals
O&M manuals are one of the best sources of information. In most cases, these documents provide specific recommendations for proper operation and maintenance of the machine, equipment, or system. In addition, most of these manuals provide specific troubleshooting guides that point out many of the common problems that may occur. A thorough review of these documents is essential before beginning the RCA. The information provided in these manuals is essential to effective resolution of plant problems.
Operating Procedures and Practices
This part of the application and maintenance review consists of evaluating the standard operating procedures and the actual operating practices. Most production areas maintain some historical data that track its performance and practices. These records may consist of log books, reports, or computer data. These data should be reviewed to determine the actual production practices that are used to operate the machine or system being investigated.
This part of the evaluation should determine if the SOPs were understood and followed before and during the incident or event. The normal tendency of operators is to shortcut procedures, which is a common reason for many problems. In addition, unclear procedures lead to misunderstanding and misuse. Therefore, the investigation must fully evaluate the actual practices that the production team uses to operate the machine or system.
Maintenance History
A thorough review of the maintenance history associated with the machine or system is essential to the RCFA process. The primary details that are needed include frequency and types of repair, frequency and types of preventive maintenance, failure history, and any other facts that will help in the investigation.
Operating Envelope
Evaluating the actual operating envelope of the production system associated with the investigated event is more difficult. The best approach is to determine all variables and limits used in normal production. For example, define the full range of operating speeds, flow rates, incoming product variations, and the like normally associated with the system. In variable‐speed applications, determine the minimum and maximum ramp rates used by the operators.
Maintenance Procedures and Practices
A complete evaluation of the standard maintenance procedures and actual practices should be conducted. The procedures should be compared with maintenance requirements defined by both the design review and the vendor’s O&M manuals. Actual maintenance practices can be determined in the same manner as described earlier or by visual observation of similar repairs. This task should determine if the SMPs are followed consistently by all maintenance personnel assigned to or involved with the area being investigated. Special attention should be given to the routine tasks, such as lubrication, adjustments, and other preventive tasks. Determine if these procedures are being performed in a timely manner and if proper techniques are being used.
Misapplication
Misapplication of critical process equipment is one of the most common causes of equipment‐related problems. In some cases, the reason for misapplication is poor design, but more often it results from uncontrolled modifications or changes in the operating requirements of the machine.
Management Systems
The common root causes of management system problems are policies and procedures, standards not used, and employee relations, inadequate training, inadequate supervision, wrong worker selection etc. Most of this potential root causes deal with plant culture and management philosophy. While hard to isolate, the categories that fall within this group of causes contribute to many of the problems that will be investigated. Many SOPS used to operate critical plant production systems are out of date or inadequate. This often is a major contributor to reliability and equipment‐related problems. Training or inadequate employee skills commonlycontribute to problems that affect plant performance and equipment reliability. The reasons underlying inadequate skills vary depending on the plant culture, workforce, and a variety of other issues.
Identify Possible Causal Factors
What Is a Causal Factor?
A causal factor can be defined as any “major unplanned, unintended contributor to an incident (a negative event or undesirable condition), that if eliminated would have either prevented the occurrence of the incident or reduced its severity or frequency. Also known as a critical contributing cause.”
What Is a Root Cause?
A root cause is “a fundamental reason for the occurrence of a problem or event.” Analysts can look for the root cause of an event in order to prevent it from happening again in the future. The root cause is the primary driver of a process.
What Is the Difference Between a Causal Factor and a Root Cause?
The causal factor isn’t the single factor that drove the event. Instead, a causal factor was one of a few influences. The event could still occur again or would have happened without the causal factor. In fact, during a root cause analysis, analysts often use techniques called the “5 whys,” fish bone diagram, fault tree analysis etc to identify multiple causal factors until they find a root cause of an event. Put simply, the root cause is the primary driver of the event and causal factors are secondary or tertiary drivers.
During this stage, identify as many causal factors as possible. Too often, people identify one or two factors and then stop, but that's not sufficient. With RCA, you don't want to simply treat the most obvious causes – you want to dig deeper.
What sequence of events leads to the problem?
What conditions allow the problem to occur?
What other problems surround the occurrence of the central problem?
The Five Whys
The Five Whys is a simple problem‐solving technique that helps to get to the root of a problem quickly. The Five Whys strategy involves looking at any problem and drilling down by asking: “Why?” or “What caused this problem?” Invented in the 1930s by Toyota Founder Kiichiro Toyoda’s father Sakichi and made popular in the 1970s by the Toyota Production System, the 5 Whys strategy involves looking at any problem and asking:
“Why?” and “What caused this problem?”
The idea is