Maintenance and Reliability Best Practices. Ramesh Gulati
of assets. Additionally,regulatory requirements may force some level of PM to be performed (e.g., crane inspections).
Preventive maintenance requires that maintenance or production/operations personnel pay regular visits to monitor the condition of an asset in a facility. The basic objective of PM visits is to take a look at the asset to determine if there are any telltale signs of failure or imminent failure. Also, depending on the type of asset, a checklist or a procedure with task details indicating what to check or what data to take may be used; e.g., change the filter, adjust the drive belts, and take the bearing clearance data. The observers also document the abnormalities and other findings. These abnormalities should be corrected before they turn into failures for a PM program to add any value.
These PM inspections can be based on either calendar time or asset runtime. If CBM is not being performed on a particular piece of equipment, or if CBM cannot detect a particular failure, then the next best approach is a runtime-based PM program, but only for equipment and failure modes that have a time basis. If a calendar time-based PM program is all that really adds value, then that approach is still better than a run-to-failure (RTF) strategy. The exception to this is when an analysis has been performed that indicates the most cost-effective strategy is run-to-failure because the total cost of maintenance is less than the corrective maintenance necessary for this run-to-failure strategy (assuming that there is no safety impact of this run-to-failure strategy).
The objective of preventive maintenance can be summarized as follows:
• Maintain assets and facilities in satisfactory operating condition by providing for systematic inspection, detection, and correction of incipient failures before they develop into a major failure.
• Perform maintenance, including tests, measurements, adjustments, and parts replacement, specifically to prevent failure from occurring.
• Record asset health condition for analysis, which leads to the development of corrective tasks.
Reliability-Centered Maintenance
Reliability-centered maintenance (RCM) is a structured process to develop an efficient and effective maintenance plan for assets to minimize the probability of failures. This process ensures that assets continue to do what the users want them to do in their present operating context cost-effectively.
Four principles define RCM and set it apart from any other maintenance PM plan:
Principle 1: To preserve system function. This is the primary objective of RCM.
Principle 2: To identify failure modes that can defeat the functions.
Principle 3: To prioritize function needs and failure modes.
Principle 4: To select applicable tasks and effective tasks to mitigate failures.
Failure mode and effects analysis (FMEA) is a key tool used in RCM analysis. Ultimately, by performing RCM, organizations are looking to develop a unique maintenance plan for all of their assets or minimally for each critical asset within a facility or organization. The detailed application of the RCM process will be discussed in Chapter 8.
Risk-Based Maintenance
Risk-based maintenance (RBM) prioritizes maintenance resources toward assets that carry the most risk if they were to fail. The risk is based on the probability of failure and consequences—the impact of failures on facility assets. This analysis methodology helps with the most economical use of maintenance resources so that the maintenance effort across a facility is optimized to minimize any risk of failure.
A risk-based maintenance strategy is based on two main phases:
• Risk assessment—assessing the probability of failure and consequences
• Maintenance inspections (tasks), which are developed based on the risk
Assets that have a greater risk and consequence of failure are maintained and monitored more frequently. Assets that carry a lower risk are subjected to less stringent maintenance programs. Implementing a risk-based maintenance process means that the total risk of failure is minimized across the facility most economically. A risk matrix is used to analyze the data.
Figure 3.1 provides an example of a risk matrix. As shown, assets A and B carry more risk than asset C; therefore, they need a more stringent maintenance plan.
As with RCM analysis, FMEA is a key tool used to analyze the data. With RBM, the risk matrix is used to analyze the data and make appropriate decisions. FMEA and risk analysis methodology will be discussed further in Chapter 11, “Problem Solving and Improvement Tools.”
RBM methodology is generally applied to pressure vessels, piping,and chemical-/energy-intensive assets.
FIGURE 3.1 Risk Matrix
Proactive Maintenance
Proactive maintenance refers to different maintenance approaches. Some consider CBM and PM approaches to be proactive because they take a hands-on approach rather than simply reacting to equipment failure. In some organizations, proactive maintenance is calculated as:
One category of work that differs from this mindset is proactive maintenance in which tasks are generated based on what is found during CBM and PM tasks, including work identified as a result of root cause and failure analysis. Another definition is that anything on the maintenance schedule is proactive—that is, any maintenance work that has been identified in advance and is planned and scheduled.
Corrective Maintenance
Corrective maintenance (CM) is another term used in different ways. CM is an action initiated as a result of an asset’s observed or measured condition before or after functional failure. CM work can be further classified into:
• Scheduled—planned repairs.
• Major repairs/projects. (This work is also planned and scheduled.)
• Reactive—breakdowns, failure fixing.
When an asset breaks down, it fails to perform its intended function and disrupts scheduled operation. This functional loss, partial or total, may result in defective parts, speed reduction, reduced output,and unsafe conditions. For example, a wear or slight damage on a pump impeller, which reduces output, is a function reduction failure. Full functional failure may shut down the asset and is called function-disruption failure. Function-disruption or reduction failures that are not given due attention will soon develop into asset stoppage if not acted on.
Many abnormalities such as cracks, deformations, slacks, leakages,corrosions, erosions, scratches, excessive heat, noises, and vibrations are indicators of imminent troubles. Sometimes these abnormalities are neglected because of their insignificance or the perception that such abnormalities will not contribute to any major breakdowns. The tendency to overlook such minor abnormalities soon may grow and contribute to serious catastrophic failures. It is not uncommon to receive queries from production staff in response to a “high temperature or vibration condition” about how long “we can continue running.”
It has been observed that a high percentage of the failures occur during start-up and shutdown. However, asset failure could also be due to poor maintenance. Causes that go unnoticed are “hidden abnormalities.” The key to achieving zero failures is to uncover and rectify these hidden abnormalities before failure actually occurs.
In many organizations, CM is considered repair maintenance; it is conducted to correct deficiencies and to make the asset work again after it has failed or stopped working. In some organizations, all work performed on an asset after it has failed is treated as only CM work. But in other organizations, problems found during PM/CBM