Root Cause Failure Analysis. Trinath Sahoo
circumstances or to perform tasks that the designer cannot automate. This generally means that during normal operation there is less to be done. Automation can be useful but must be designed correctly. The information should match the operators own mental idea of what is happening. The information should be useful. Important information should be given priority, cross checking should be possible to validate information, alarm analysis and decision aids should be included. Operators probably need more training to operate automated plant although it would appear they have less to do than on a manually operated plant.
Automation can be useful but must be designed correctly. The information should match the operators own mental idea of what is happening. The information should be useful. Important information should be given priority, cross‐checking should be possible to validate information, alarm analysis and decision aids should be included.
Improved Training
Training is very important in the effort to reduce human errors and hence accidents. Safety training is vital for everybody involved in the system. Operator training will not, however, improve reliability when the root cause is bad design or poor management. Training given should be well planned and appropriate to the job. Realistic simulation and role‐play exercises are some of the best ways to train people. Everybody must be familiar with the system and made aware of the risks involved and how their actions effect reliability. Training should cover the use of all job aids including procedures, and other ancillary and emergency equipment. Recovery procedures should be explained for use after errors have been made. Personnel performance checks and evaluations should be used and good, constructive feedback given at regular intervals. Refresher training should also be used to prevent behavior patterns building up such that variations of equipment and procedures can not be handled.
Motivational Campaigns
This is a system where some sort of reward is offered for operating in a reliable manner. This usually involves analyzing failure rates. There’s little doubt that advertising and campaigns can significantly affect how people think and act. It is not a set offixed rules; it is purely a collection of recommendations. Pick and choose what you want, adapt them to your individual circumstances. People are in undated with information every day. To grab their attention, your message must be short, simple and relevant to your target audience. They must immediately understand its importance and what you are asking them to do and why. Try to distil your main message down to a jargon‐free statement, ideally no longer than two sentences. One way to do this is to think in terms of ‘problem and solution’. Other method for campaign areRaise awareness of reliability problems, including their causes, amongst managers, employees and people who advise them on these issues. Provide practical solutions through ‘good practice models’. Alert people to new risks and possible solutions.
Conclusion
Human error does account for a large number of accidents, however, it is the latent errors that are the real root cause. It is far too easy to blame operators for causing accidents but it must be appreciated that all humans will make errors. It is the job of the management to ensure that system and procedures are in place to avoid such incidents. The responsibility starts at the very top, with the managing director, and must work its way to all levels in the company. That way the company culture will improve to support reliability first.
5 Metallurgical Failure
Many a times unanticipated equipment failures do occur for a variety of reasons. These events often become too costly as well as disruptive to plant operations and may also have safety implications. To minimize the frequency and severity of such failures, it is necessary for personnel who have equipment responsibility to understand the failures and to confront their causes. Potential causes of the failure of the components and their mechanism are numerous. Therefore, procedure of the failure analysis of each failed component should be different and the same must be developed after giving proper thought on possible sequence of events before failure along with proper evaluation of the situation and consideration of material, manufacturing process, service history and actual working condition, etc. Since the failure analysis involves lot of efforts, time and use of resources therefore at the end of analysis failure analyst should be in a position to come out with few most potential causes of the failure so that suitable recommendations can be made to avoid reoccurrence of the similar failure.
It may sound a little far‐fetched, but experts say that the causes for more than 90% of all plant failures can be detected with a careful physical examination using low power magnification and some basic physical testing. Inspection of the failure component will show the forces involved, whether the load applied cyclically or was single overload, the direction of the critical load, and the influence of outside forces such as residual stresses or corrosion. Then, accurately knowing the physical roots of the failure, you can pursue both the human errors and the latent causes of these physical roots.
In this chapter, an overview of the processes involved in a typical metallurgical failure analysis is provided. The discussion describes various failure mechanisms in metals that can be examined, some of the tests and processes that are used in an analysis, for failed components.
The metallurgical failure analysis can be defined as a scientifically based systematic laboratory examination of metallurgical evidence and the gathering of background information related to an equipment failure. This analysis helps in establishing the cause of the failure. Because the approach to the failure analysis is usually determined by the nature of the failure, all analyses do not require the same procedure. Laboratory procedures focus on the failed equipment itself and most commonly consist of general and detailed macrophotography, metallo‐graphic examination, chemical analysis of the failed part and of any extraneous or foreign materials present, mechanical property determinations, fractographic examination, and others.
Understanding the Basics
Before explaining how to diagnose a failure, we should review the effects of stress on a component. When a load is put on a part, it distorts. In a sound design the load isn’t excessive, the stress doesn’t exceed the “yield point,” and the part deforms elastically, i.e., when the load is released the part returns to its original shape. This is shown in Figure 5.1, a “stress–strain” diagram that shows the relationship between loads and deformation. In a good design, the part operates in the elastic range, the area between the origin and the yield strength. Beyond this point, the part will be permanently deformed, even greater increases in load will cause the part to break.
Elastic Limit
The elastic limit is the limit beyond which the material will no longer go back to its original shape when the load is removed, or it is the maximum stress that may be developed such that there is no permanent or residual deformation when the load is entirely removed.
Elastic and Plastic Ranges
The region in stress–strain diagram from O to P is called the elastic range. The region from P to R is called the plastic range.
Yield Point
Yield point is the point at which the material will have an appreciable elongation or yielding without any increase in load.
Ultimate Strength
The maximum ordinate in the stress–strain diagram is the ultimate strength or tensile strength.