Reliability Assessment: A Guide to Aligning Expectations, Practices, and Performance. Daniel Daley
sacrifice an employee by asking him to perform the evaluation and deliver the bad news.
Let’s discuss the elements that should be included in a WDYHARTE (What do you have a right to expect?) analysis. As our fictional reliability engineer explained, each and every point in the life of a system affords us with opportunities to make choices that will affect reliability. In some cases, the individuals involved are aware they are making choices that affect reliability. In other cases, they are not aware. Sometimes they make sound choices that positively affect the reliability, but sometimes they make choices that compromise the reliability. They then often rationalize that current savings are more important to the business than the added costs — stemming from poor reliability — that will be experienced much later (or by someone else).
Let’s go back and review the elements one by one that determine “what you have a right to expect.” Expectations for performance are often not based on any comprehensive analysis or assessment. Instead, they are based on a “gut feel” or “hoping for the best.” Expectations without the information needed to provide an informed opinion are misinformed and ultimately lead to disappointment. When expectations are aligned with reality, people and businesses are more likely to get what they expect and expect what they get.
Inherent reliability is probably the single most important characteristic of any system or piece of equipment in terms of determining overall reliability performance. The inherent reliability of a system or device is determined by its configuration and component selection. For instance, if a plant has redundant feed pumps or recycle compressors, that fact will profoundly affect the inherent reliability. Also, if the components were chosen based on lifecycle cost rather than just first cost, the inherent reliability will be enhanced. In performing this analysis, the lifecycle cost includes first cost, all forms of maintenance costs, the costs associated with unreliability (e.g., lost profit associated with unplanned outages), and costs associated with unavailability (e.g., lost profit associated with necessary planned outages).
The inherent reliability is a measure of the overall “robustness” of a system or piece of equipment. It provides an upper limit to the reliability and availability that can be achieved. In other words, no matter how much inspection or maintenance you perform, you will never exceed the inherent reliability. If you operate, maintain, and inspect a device as well as possible, you will be able to harvest all of the inherent reliability. On the other hand, if there are gaps in your operating, maintenance, or inspection practices, you will harvest only some portion of the inherent reliability.
If you wish to improve the inherent reliability of an existing system or device, you will need to change the current configuration or component choices and you will need to do so in a manner that improves reliability rather than detracts from it.
Because most systems and devices spend their lives with much the same inherent reliability as was decided by the original design, it is critical that the initial design take reliability and availability requirements into consideration. Adding a redundant component is both difficult and expensive after the original system has been built. In the case of a plant, piping has to be run a great distance to a spot where space is available. This awkward configuration is also confusing for operators. Although redundancy in printed electronic circuits is less expensive than in large physical systems, the difficulty of changing the software that controls the circuits and takes advantage of redundancy is complicated; it is difficult to ensure that new defects have not been introduced.
It is best to apply one or more of the design techniques that fall under the heading of Design-For-Reliability to ensure that long-term reliability requirements are addressed concurrently with the physical design of any system. One example of a DFR technique is RBD or the Reliability Block Diagram technique. Using this technique, each of the elements of a system is represented by a block and connected to other elements in a manner that closely represents the manner in which they interact in the actual system. Characteristics are assigned to each block; they cause it to act mathematically in the same manner as the actual component.
If the actual component has poor reliability, it will fail frequently. If it has poor availability, it will have characteristics that cause it to be down for maintenance a large portion of the time. For manually constructed RBDs, there are techniques that allow the composite reliability to be calculated by hand. It is also possible to construct RBDs in software that simulates the actual performance of real systems. These programs simulate the planned and unplanned outages of components based on characteristics that accurately represent the real-life components that have been chosen.
After RBDs have been assembled and calculations completed, you will have an initial estimate of the inherent reliability that is reasonable to expect. If the calculated reliability does not meet requirements or expectations, either the configuration can be changed (e.g., adding redundancy) or different (more reliable) components can be selected. By inserting the new configuration or characteristics of new components into the model and re-running the calculations or software, it will be possible to estimate the improvement.
Once a configuration and list of component choices have been finalized, it is possible to perform lifecycle cost comparisons to evaluate if the cost of changes is justified by the reduction in lifecycle costs (resulting from fewer and/or shorter outages or by lower maintenance costs).
If initial project design procedures account only for system integrity (e.g., structural or pressure retaining capability) and not for reliability and availability performance, the owner will have to “take what he gets” for those two performance areas.
Another element of reliability mentioned in the fictional account described above is that of initial construction or assembly. It is possible to design a system to be reliable, but then lose a portion of the benefits of all that cost and effort when the system is constructed. Inherent reliability depends on things being assembled in a manner that does not introduce additional defects. All too often, shortcuts made to meet schedule or due to misunderstandings in how things should be assembled lead to the inclusion of defects. The example of pipe stress on the nozzles of rotating equipment is one that many reliability engineers have faced. Inadequate door seals that allow liquid intrusion and ultimately cause corrosion are another common example. The list is endless, but the solution is strict controls during construction.
Harvesting All the Inherent Reliability
As mentioned earlier, the inherent reliability is the maximum possible reliability performance, but it is possible to perform much worse. The portion of the inherent reliability that is actually harvested or achieved is a result of:
•How well the system is operated
•How well it is maintained
•How well it is inspected
An automobile is a good example of a device that has a usable life that is determined by how it is operated. For example, some vehicles last several hundred thousand miles for an original owner. Yet, the exact same models frequently last only tens of thousands of miles when they are traded from hand to hand. If the owner drives the vehicle conservatively, sees that it is regularly maintained, and is sensitive to unusual noises or behaviors; it is possible to achieve a long and reliable life. If the owner accelerates too quickly, rides the brakes, and is insensitive to minor problems until they turn into major problems; the car is likely to be less reliable and to have a shorter life.
Although failures that are caused by poor operation are typically charged to the equipment rather than to the operator, a significant portion of the reduced reliability is not the fault of the equipment. For instance, if the MTBF (Mean Time Between Failure) for a device is two years and every other failure is caused by mis-operation, then the equipment MTBF should be four years. If the MTBF of a device is two years and every second failure is due to mis-operation and every third failure is due to a power failure or an upstream instrument failure, the MTBF of the device should be six years. If you are blaming the device and, as a result, you are