Root Cause Failure Analysis. Trinath Sahoo
of Fault Trees
A fault tree creates a visual record of a system that shows the logical relationships between events and causes that lead to failure. It helps others quickly understand the results of your analysis and pinpoint weaknesses in the design and identify errors. A fault tree diagram will help prioritize issues to fix that contribute to a failure. In many ways, the fault tree diagram creates the foundation for any further analysis and evaluation. For example, when changes or upgrades are made to the system, you already have a set of steps to evaluate for possible effects and changes. You can use a fault tree diagram to help design quality tests and maintenance procedures.
Identify the Root Cause
Look over your list of potential causal factors and determine the real reason this problem or issue occurred in the first place. These data should have provided enough insight into the failure for the investigator to develop a list of potential or probable reasons for the failure. Dig deep to examine each level of cause and effect and the events that led to the unfavorable outcomes. The problem is that in the real world it is never possible to prove a single event that solely initiates a whole chain of other events. That is because there are always other events before the so‐called “root cause event.” This may seem like semantics, but for problem‐solvers, it is important to keep in mind that there never is a silver‐bullet answer.
Analyzing the short list of potential root causes is to verify each of the suspect causes is essential. In almost all cases, a relatively simple, inexpensive test series can be developed to confirm or eliminate the suspected cause of equipment failure.
Most equipment problems can be traced to misapplication, operating or maintenance practices and procedures. Some of the other causes that are discussed include training, supervision, communications, human engineering, management systems, and quality control. These causes are the most common reasons for poor plant performance and equipment reliability. However, human error may contribute to, or be the sole reason for, the problem.
Recommend and Implement Solution
When working on solutions, keep your Root Cause Analysis aim in view. You don’t just want to solve the immediate problem. You want to prevent the same problem from recurring.
Ask the following questions for finding a solution,
What can you do to prevent the problem from happening again?
How will the solution be implemented?
Who will be responsible for it?
What are the risks of implementing the solution?
A short list of potential corrective action are generated. Each potential corrective action should be carefully scrutinized to determine if it actually will correct the problem. Because many time the analyst Try to fix the symptoms of problems rather than the true root cause. Therefore, care should be taken to evaluate each potential corrective action so that the right one can be implemented to eliminates the real problem. Many a time all corrective actions are not financially justifiable. In some cases, the impact of the incident or event is lower than the cost of the corrective action. In these cases, the RCA should document the incident for future reference, but recommend that no corrective action be taken on some occasions, implementing a temporary solution is the only financially justifiable course of action which can only correct the symptoms. In these instances, the recommendation should clearly define the reason the limitations why this decision was taken and what impact it will have on plant performance.
Also, consider whether the changes you plan to make will impact other areas of your business. Changes to processes can have knock‐on effects. Be sure you aren’t setting yourself up for a new set of problems when you implement the solution. To do this, you need to look at your process flows and how they relate to one another.
The final part of the solution design process is to decide on checks and balances that will tell you whether your business is implementing the solution you’ve devised and whether it works as planned.
Implementation means change, and change must be carefully managed. Everyone concerned needs to know about your solution and the reasoning that led you to believe that you can solve the problem.
So, explain the root cause analysis process and how you arrived at your conclusion. Explain your solution and how you want it to be implemented. Ensure that everyone involved has the knowledge and resources they need to follow through and devise method for testing your new system.
Keep in mind, though, that it’s always better to first apply the solution on a small scale. You can never know what could go wrong. Once you’re certain that the new solution brings results, you can start applying it company‐wide.
Conclusion
When you designed the solution, you decided on key indicators that would allow you to see whether the solution works. Use these indicators to follow up. In this instance, you’re going to see whether the symptoms are gone. The presence or absence of the issues that launched you on your root cause analysis and problem‐solving initiative will tell you whether you have successfully solved the problem. Remember to watch out for new issues that may arise elsewhere as a result of the changes you made.
4 Managing Human Error and Latent Error to Overcome Failure
Everyone can make errors no matter what their level of skill, experience or how well trained and motivated they are. Commonly cited statistics claim that human error is responsible for anywhere between 70 and 100% of failure. Many major failures, e.g. Texas City, Piper Alpha, Chernobyl were contributed by human failure. To enhance reliability, companies need to manage human failure as robustly as they manage technical and engineering failures. It is important to be aware that human failure is not random; understanding why errors occur and the different factors which make them worse will help you develop more effective controls.
Human error was a factor in many highly publicized accidents in recent memory. The costs in terms of human life and money are high. Placing emphasis on reducing human error may help to reduce these costs. This chapter provides an insight view about the causes of human errors and suggests the way to reduce the errors.
Review of Some of the Accidents
Over the last few decades, we have learnt much more about the origins of human failures. The industries/organizations must consider human factor as a distinct element to be assessed and managed effectively in order to control risks. Some of the following accidents of Table 4.1 in different sectors provide clues to understand failures.
Table 4.1 illustrates how the failure of people at many levels within an organization can contribute to a major disaster. For many of these major accidents, the human failure was not the sole cause but one of a number of causes, including technical and organizational failures, which led to the final outcome. Remember that many “everyday” minor accidents and near misses also involve human failures. All major disasters lead to huge human, property, and environmental losses.
All this evidence shows that human error is a major cause of unreliability or causation of accidents.
Types of Human Failure:
What Types of Errors Do Humans Make?
The consequences of human failures can be immediate or delayed and the failures can be grouped into the following categories: