Device Diagnostics and Asset Management
Is effective asset management essentially a technical issue, or a people process? The second word in “asset management” may be the larger problem. During my years at Shell Global Solutions and through connections with various technical organizations, I have been concerned that there seem to be relatively few companies that use the tools effectively, in spite of the benefits a well-designed program can bring.
There is no question that field devices have gotten smarter. The amount of information a flowmeter or valve actuator can communicate has grown by leaps and bounds over the last few years. But simply adding these diagnostic functions to the hardware does not make for an effective management program. One contributing factor is that there are no standards or expectations in most industries for what management should do or be accountable for as part of an asset management program. Some heavily regulated industries (such as the transportation sector) provide exceptions, but they do not necessarily provide a useful example for the rest of the process world to follow.
Effective use of automated diagnostic capabilities in devices in a larger context of an asset management program can provide huge benefits in manufacturing at many levels. The first and most obvious effect is an increase in the efficiency and effectiveness of maintenance activities. Here’s a typical situation: One of the most common responses from instrument maintenance activity is “no problem found.” This is often the result after several things have happened:
• A maintenance crew was dispatched to the field with test gear;
• The crew spent significant time and cost troubleshooting;
• They may have experienced some risk performing the troubleshooting activity; and
• The activity at best only delayed resolution of the original problem.
Effective diagnostic tools can make these activities quick and easy to perform from a central control facility or even a remote location. The concept of taking the data to an expert instead of taking the expert to the problem is a powerful and underused concept, but depends on proper implementation of a good set of diagnostic tools and effective work processes.
A second and potentially much larger benefit to automated diagnostics comes from the ability to detect and repair failures before they have significant operating impact. In this case, “repair” has two meanings. One meaning is automated handling of the failure through control system configuration. This is a technique that is widely used, but far from universal. The second meaning involves human intervention to deal with incipient failure in real time. This second meaning is possible and practical, but rarely implemented. Repair of incipient failures before something bad happens requires new work practices that are foreign to most maintenance organizations. The implementation of capabilities to effect repair before operating impact, can provide significant improvement in profitability, safety, and environmental performance with minimal investment in physical infrastructure.
Overview of barriers
Given the large incentives and low capital cost for asset management, it would seem that this technology would be widely used. The reality is far different. Let’s take a look at those barriers and suggest some possible resolutions.
The first impediment has been the technology barrier. The tools available for asset management have traditionally been incomplete, hard to use, poorly integrated (multiple overlapping tools), rapidly evolving, and poorly supported by vendors. Frequently users have not had the engineering resources necessary to overcome these barriers and integrate a solution that is practical and usable by field organizations. The good news is that the tools are improving and users have been able to demonstrate effective ways to use them. The bad news is that effective use of the tools and the engineering skills necessary for effective deployment is rare.
The second problem for asset management is management. All of the technical problems have solutions given management support. However, management support is not a given. For management to support a sustainable asset management program, it has to be rewarded for doing so. If management’s only reward is for quarterly returns, maintenance activities will be deferred until the asset is unsustainable. The short-term effect of deferring maintenance is often nothing. The long-term effect of indefinite deferral is always bad for the asset.
Management rewards for asset management require good metrics, good reporting, and effective audits. There must be an effective scorecard for effective rewards. This scorecard does not necessarily exist for many operations.
In the absence of a good management system, poor performance is often treated as a technology problem, or just the normal state of things. Who’s supposed to be acting on the diagnostic information? Maintenance? Telling a maintenance guy to look at an incomprehensible mass of unprioritized diagnostic data and decide what he should do based on that information is very difficult or impossible in most organizations. Properly implemented systems can provide order and priority to make the data useful to engineering and maintenance organizations.
Implementing an asset management program is a project in itself. It can be a stand-alone project for an existing facility, or an add-on to new construction or control system modernization. For an existing facility, a lot of stranded (unusable) diagnostics often exist in smart field devices that are not connected to a smart system. The cost of system tool implementation in an existing facility can be an issue, but it is manageable. For new construction, the added cost of asset management tools is almost negligible. But this is just the hardware side. Either way, the work processes, training, metrics, and management processes have to be implemented just the same.
The first phase of any project is called front-end design (FED) or some equivalent name, and it is the phase where the owner decides what he wants to do and how he wants to do it. Asset management implementation requires planning at this stage of the project. A recipe for disaster is to buy some asset management software, use it in its native state for commissioning, and turn it over to maintenance as a troubleshooting tool. Once the project staff leaves the plant with a tool set in this condition, in all likelihood it will never be used again.
One of the first activities in an asset management program is creating a criticality ranking for each piece of major equipment and each device. This is often a painful process, but if you don’t have a good system for it, everything will be ranked critical. That’s because people know that if something isn’t ranked critical, it won’t get fixed. You need to know the impact severity and the likelihood of a problem to do a good criticality ranking. Of course, when you actually get a device alert, you’re only interested in the impact, because the likelihood is one. It just happened. The impact is the only thing you’re interested in at that time, and that will set the priority configuration of the failure alert.
You will use this criticality information throughout the design process, while implementing other maintenance activities, and planning. But many people don’t do it during FED, often waiting until the system is built, installed, and operating. Then they think, “We need to do maintenance now, so let’s do some criticality ranking.” It has to be done during the design phase if you want an effective project.
You also need to manage your vendor list during FED. You need qualified vendors, not just open bids. And those that you choose need to set up a lot of templates because smart devices with diagnostics have gazillions of default settings. There’s a lot of database work that has to start early on so that you can import all those templates and go right into your design and build process.
If you begin early enough and do the work systematically, the plant construction and start-up process will go faster and much smoother. During the design and factory acceptance testing, you need to do the building, create your tools, and train your people. You want to use all those diagnostics through those phases, during installation, commissioning, and loop checking. Typically, the system will pay for itself right there. You’ve covered the investment by the time you get the plant started up. History says that in most plants where we’ve done that, we get the system de-bugged, start up the plant, back-check and verify everything, correct all the mistakes, and then we turn the system off and never look at it again. Even though the system paid for itself during construction and commissioning, the facility owner misses out on the big payback during plant operation.
Ignoring the system once the plant gets going doesn’t have to be the normal chain of events. Unfortunately, the reality is that engineering and maintenance staffs generally do not have the resources to create an asset management system if it was not implemented by a project organization.
Feeding the black hole
One of the statements I have made while discussing this topic is that most diagnostic information goes into a “black hole.” That often results in some shocked looks. It’s not for a lack of available data; it’s that nobody seems to know what to do with what’s being collected. The stranded diagnostic information isn’t the only problem. It’s the tools and work processes. It’s not hard to hook up modems to wired HART. Those wires come to the house and you can hook up modems to those wires cheaper than you can add wireless HART communicators. But that’s not the problem. The problem is now that you’ve brought the data into the house, what is the next step? What kind of system are you going to use to manage the data, generate reports, do the alerting, and keep the configuration properly hosted and up to date? You can move the data stream into the house, but it’s just going to go into the bit bucket unless you do something with it. If all you do is put in wireless THUMS, you’re feeding the black hole. It’s not a technical issue—it’s a system problem and a management problem. That’s what makes this whole discussion so complex and challenging.
Establishing those work processes is a key step in creating a workable system, and the system vendors don’t always help. Their tools may not fit a given user’s work processes. While the situation is improving, users have had to create asset management system tools and work processes in-house because they can’t find what they want commercially. Some of the vendors have elaborate programs and work processes for predictive procedures where you do optimized scheduled maintenance based on history, failure rate, criticality, and things like that, but nothing that makes use of diagnostics as a work process. You need to cover all system nodes, networks, and modules, but typically the asset management systems concentrate on field devices. You can’t just write off the nodes and networks as belonging to the systems guys. We need to manage the whole asset in one system.
It’s important to understand that the black hole not only eats diagnostic information, it also eats device configuration. Configuration accuracy is volatile unless it is carefully managed. The solution is to use a system-wide configuration database that is always kept in synch with field devices. This level of configuration accuracy can happen with carefully integrated systems but will not happen with handheld portable configuration tools.
Traditional maintenance technology for managing maintenance priority is a reliability matrix. We’ve done the risk assessment, and we’ve determined that this device is medium- to low-priority. We look at the list of what we have to do, and how many critical things are on the plate today, and all of the low priority stuff gets deferred, sometimes forever. All of it. It may not even get addressed during turnarounds because of budget. So the low priority stuff accumulates failures. That’s fine, as long as you don’t have too many failures. Enough low priority failures can cause a larger-scale system failure, because the operators can’t tell what’s going on. There aren’t enough measurements. There aren’t enough controls. You can’t run the plant. System failures have greater impact than low priority device failures, but treating devices individually can lead to system effects that are not modeled or managed by the simple decision matrix.
Often after a major failure or an operational disaster, an investigation discovers that there were many signs of the growing problem, but nobody was able to see them or correctly interpret what they were seeing. Field device diagnostics were trying to warn of a growing problem, but nobody was able to connect the dots. Often it’s an accumulation of small things adding up until they reach a critical mass. An accumulation of small (low priority) problems is common today among operating companies. You’ll see this accumulation of problems if you look at a catastrophe analysis. They had this little thing fail, and this little thing failed, and this little thing failed, and nobody raised a flag because they were all little things. After a while, enough little things line up in series and become a big thing. If you line up all the holes in a Swiss cheese, there’s a hole all the way through it. It’s another management failure.
It’s also a technology inadequacy because we don’t have a way to model collections of failures. When you look at our safety system models, you’re looking at individual failures and the model assumes that everything else is working but this one failure. In the basic process control field, you can’t make that assumption. You can generally assume that there are a lot of failures out there at one time that haven’t been repaired, and that they are slowly accumulating over time. Audits of plants often reveal such runaway conditions of this. They’re usually the facilities that are on the borderline of losing money, and the owners can’t decide if they should shut them down. You can turn such plants around, but it doesn’t happen in maintenance—it happens as a project.
Getting data to the right place
Once an appropriate data collection system is in place and you are working on setting up your work processes, you need to determine where the data stream goes. The traditional question is if it goes to maintenance or operators. This isn’t a difficult process if you follow some simple principles: Send alerts to operators as well as maintenance if immediate operator action is required. The alert philosophy for operators is you’re dealing with individual events as they come up. You want a limited number of alerts that the operator can take some unique action on in real time.
On the maintenance side, you don’t want to deal with individual events. You want to log every little thing that happens, and then you want to use reports to sort through all of those logged events and make some sense out of them. You’re looking at history and analyzing what’s happened once, what’s happened a bunch of times, how high the priority was, and whether it happened to multiple devices. You can see trends from reports that get lost if you’re looking at individual events. A single problem with your air system can cause hundreds of events per day. You need a reporting format that can bring all that together and identify a common source. Clearly, an effective asset management program using diagnostics from smart devices can pay major dividends.
When is it baked?
It is really quite easy to tell. If you routinely use diagnostics to find out how something failed after the plant has had an unexpected shutdown, you haven’t finished the job. If you have a large database of saves where diagnostics were used to prevent an unplanned plant shutdown, you are on the right track.
Herman Storey is chief technology officer of Herman Storey Consulting, and a frequent presenter on asset management programs and other topics. Reach him at firstname.lastname@example.org.