Maintenance and Reliability Best Practices. Ramesh Gulati
uptime.
• Assets are functioning and producing as designed.
• Maintenance costs are reasonable (at optimum level).
• The plant/factory operates safely and reliably.
The importance of reliability and of implementing best M&R practices is discussed at the highest level of the organization. Many organizations talk about RCM/reliability, but it’s treated as the “program of the month” and loses its emphasis over time. Changing an existing culture of “run to failure” or little or no PM program to a sustainable reliability culture takes many years and consistent management support and resources.
In a reliability culture, the prevention of failures becomes an emphasis at every organizational level. The entire workforce is focused on asset reliability. Everyone in the workforce—operators, maintainers,engineers—thinks and acts to ensure:
• Assets are available to produce when needed.
• Assets are maintained at a reasonable cost.
• An optimized maintenance plan (FMEA-/RCM-/CBMbased) is in place that includes:
• All assets identified with criticality and having a documented maintenance plan
• A defect elimination program
• 80/20 principles applied to prioritize the work
• Most of the work planned and scheduled
If an asset fails, it gets fixed quickly, the root cause is determined,and action is taken to prevent future failures. Plant/asset reliability analysis is performed on a regular basis to improve uptime. The workforce is trained and taught to practice reliability-based concepts and best practices on a continuous basis.
Reliability Culture—Creating and Sustaining
Let us look into a real plant scenario where an asset breakdown/failure has occurred:
Operations reported that Valve P-139 would not close. An operations workaround was used to divert the process temporarily. The breakdown was reported to the maintenance department with an urgent request in the CMMS/EAM system to fix the valve.
The following events happened:
• Maintenance dispatched a mechanic to evaluate and fix the valve.
• The mechanic noticed “a burning smell” upon arrival and suspected the electric motor on a hydraulic pump had burned up. He called an electrician to help.
• The electrician determined that the motor had failed. He asked his supervisor to find a replacement motor.
• The supervisor called the storekeeper, who found that no spare motor was available.
• The supervisor called operations to report that the motor had failed and would take a couple of days to repair. Operations demanded the repair immediately, so the supervisor called the plant engineer to help locate a spare motor.
• The plant engineer and supervisor found the same type of motor on a similar system not being used. The supervisor sent another crew to remove this motor while the first crew removed the failed motor.
• Maintenance replaced the motor and adjusted linkages due to sluggish operation. The valve was released to operations.
• The work order was closed with the comment “Valve was fixed.”
• The folks in operations were so happy with a 4-hour repair time (rather than 2 to 3 days) that they sent an e-mail thanking the maintenance crew for a job well done. The maintenance manager also recognized the crew and thanked them for a good job done.
What kind of culture does this plant have? What kind of message is being delivered to the workforce? It appears that this organization has a reactive culture. Fixing things is recognized and appreciated.
Now, let us look into another plant with the same breakdown scenario, but where the sequence of events happened a little differently:
• After receiving the information/call, the maintenance supervisor/scheduler visited the site, assessed the failure, and found that the valve linkage was tight and dry. Probably the electric motor on the hydraulic system had burned, causing failure.
• The supervisor/scheduler assigned a mechanic, and an electrician requested both a 6-month chronological history report and a recommended parts list, and also alerted the plant engineer of the problem.
• The electrician determined that the motor had failed (burned). The overload relays didn’t function properly. The mechanic found that the linkage was tight due to inadequate lubrication.
• The repair history (attached to the work order) showed that the following problem had arisen a few months ago: “Problem with valve closing. The mechanic had adjusted and greased the linkage. The hydraulic pressure on the system had been raised from 1,500 psi to 1,800 psi to make the actuator and linkage work smoothly.”
• The scheduler/supervisor quickly put a repair plan together,which included the replacement of the motor and overload relays, restoration of the hydraulic pressure to system design,and greasing/adjusting the linkage. A spare motor was available as a part of the repairable program.
• Work was completed as planned. The operator supported the repair and helped in testing the system. The valve returned to operation.
• The WO was closed, and repair details were documented.
• The operations people were pleased with a 2-hour repair. After personally thanking the maintenance crew for a job well done and for finding the root cause, the maintenance manager asked the crew for a plan of further action needed to improve the reliability for review in 10 days so future failures could be prevented.
Now let’s review what happened in this plant. It seems like this plant was doing fairly well. A lot of things went well during this repair and in the follow-up actions suggested by the maintenance manager. But is everything going as well as it could? Is the CMMS/EAM system providing the data we need to make the right decisions? What kind of message is being delivered to the workforce? What kind of culture is in this plant?
In this plant, the CMMS/EAM system has provided the information to help make the right decisions. The maintenance manager is emphasizing failure prevention. It’s a proactive culture—a step in the right direction.
And now let us look at another plant, with a similar type of situation, but where events happened a little differently. In this case, the plant operations (operator) noted that on Valve # P–139:
• Motor: Current data on the operations panel indicated a higher-than-normal current. The visual inspection and site visit indicated that the valve actuator was running sluggish. Maintenance was alerted by the operator.
• Maintenance evaluated the situation with the help of the operator and planned the repair on a scheduled downtime period.
• The repair was completed, and there was no unscheduled downtime. All repairs were documented in the CMMS/EAM system for asset history.
• PM tasks were reviewed, and root cause analysis was performed. Based on this analysis, PM tasks were updated. A work order to redesign the linkage based on root cause analysis was also issued to design/engineering.
• Operators were thanked for watching the asset/system closely
What happened in this plant? What kind of culture does this organization have? In this plant, “failure” was identified and addressed before it happened. Additionally:
• Operations and maintenance worked together as a team.
• The system provided the “warning” data.
• The process was designed to make it happen.
In this organization, the reliability/maintenance leaders have done their work. They provided the right tools, trained both operators and maintenance, and created the right culture—a