Figure Out When Enough is Enough!
Throughout the 1980s, control system redundancy existed mostly because user experiences with microprocessor-based devices came from personal computers, and because control system sales persons made statements like, "It will fail; we just don't know when." By the 1990s many end-users had learned microprocessor-based control systems were more reliable than expected and had begun re-deploying ba...
Throughout the 1980s, control system redundancy existed mostly because user experiences with microprocessor-based devices came from personal computers, and because control system sales persons made statements like, 'It will fail; we just don't know when.'
By the 1990s many end-users had learned microprocessor-based control systems were more reliable than expected and had begun re-deploying backup controllers to primary control roles-much to the chagrin of many suppliers.
Also in the '90s, 'hybrid' and 'open' control systems began appearing. By this time end-users were more experienced about redundancy issues, asking suppliers tougher questions about control system reliability expectations.
Further clouding the redundancy issue, the term 'control-system redundancy' had come to refer to the distributed control system or programmable logic controller and generally did not include field instrumentation.
Production domains don't often appear in risk-assessment matrixes, but there's really no reason they shouldn't. This illustration expands the traditional hazard domain matrix to include operational domains.
Unfortunately, throughout these many years, no widely accepted, robust, easy-to-apply methodology for determining when to apply control system redundancy emerged. Even so, many risk-reduction techniques have been developed and successfully applied. While each technique has evolved its own uniqueness, most resemble the following.
Apply a methodology
Today, the quality movement remains a major focus among producer companies and considerable resources are committed to ensuring qualitative measures are achieved for the products produced.
Working with customers and suppliers, producer companies establish and focus on measures expressed as numbers that define production goals of yield, rate, and/or quality.
Such numbers remain inconsistent across industries or even within the same company, but do appear on daily, weekly, and monthly production reports. Operations' personnel 'live and die by' by numerical measures, so it makes sense numbers should be considered when defining production availability requirements, including determining control system redundancy.
Study the process
All production processes undergo reviews (at least they should) to evaluate opportunities to improve production efficiencies and/or reduce safety, hazardous, and environmental risks.
Frequently such reviews have been informal and focused on only one element-efficiency for example. However, significant benefits can be achieved by formalizing the review process and looking at several at-risk domains simultaneously. (See 'Sample-Domain impact risk assessment matrix' diagram.)
Most companies have well-established event-likelihood metrics for safety and hazard risk analysis, but seldom apply any metric to production risk analysis. With only a bit of effort, metrics can be expanded to address both types of risk analysis.
For example, many companies engage in the hazards and operability (HAZOP) study process. HAZOP studies intend to examine each production unit, on a loop-by-loop basis, and identify, document, and eventually improve process operability while mitigating hazards.
Hazards often become the HAZOP focus at the expense of operability-related issues.
Extraordinary improvements in operational efficiencies can result when HAZOP participants balance discussions and documentation between understanding hazardous situations as well as how each loop contributes to defined production goals.
Dr. Angela Summers, president and HAZOP consultant for SIS-Tech Solutions (Houston, Tex.), puts it this way: 'If companies don't conduct operability analysis alongside hazard analysis, they are not getting maximum business value from the HAZOP study.'
Identify critical loops
Once a control system is installed, the number of wired input variables frequently 'grows' with little explanation to operators of the new variables' contribution toward meeting production goals.
Likewise, alarms are often arbitrarily added to these variables augmenting operator confusion and information overload.
Conducting operability reviews, with knowledgeable participants in attendance, provides an opportunity to identify the control loops critical to achieving defined production goals.
Surprisingly, the number of critical-to-production control loops usually ends up to be a handful for any given unit, tightening the study's focus, and more closely defining where critical alarms should exist.
Anyone who's participated in the hazard analysis portion of a HAZOP study understands how scenario/events-such as no flow, too much flow, high or low temperature, etc.-are discussed, documented, and ranked.
For each critical-to-production control loop, the same scenario/event discussions need to occur and be developed, discussed, documented, and ranked. Things like, what's the impact to yield, quality, or production if critical levels, flows, temperature, etc., are not maintained within defined operational ranges? What are those ranges? Have warning alerts and alarms been appropriately assigned? Should loop modes and setpoints be programmatically changed to 'safe' operational values when/if alarms are ignored?
Operational scenarios differ from hazardous scenarios/events, and from application to application (i.e., continuous versus batch). Preplanning will produce a set of operational scenarios that meets most of what the application requires; the remainder will emerge during the actual study.
Determine likelihood, impact
During design and operational reviews, the likelihood of a unit and/or loop control scenario occurring needs to be assigned a likelihood value. (See 'Sample-Event likelihood assessment matrix' diagram.)
Many companies have developed such value matrixes but tend to use them only during hazard-analysis studies. Few matrixes exist for assigning likelihood values to production-impacting events.
For example, what's the likelihood of an equipment failure or human error that impacts yield, production, or quality?
The reality is that scenario/event likelihoods are probably about the same for production as they are for safety and hazards. That means existing likelihood assessment matrixes could be universally adapted or at least serve as good starting points for establishing production impact values.
Analyze to reduce risk
Domain-impact and event-frequency risk assessments produce index numbers applicable to a risk-ranking model, thus prioritizing equipment, instrumentation, and control system risk-reducation opportunities.
Everyone understands the importance of analyzing various situations to determine and reduce risk. In fact, we subconsciously analyze risk when the light turns from green to yellow as we approach a busy intersection. Likewise we consciously and subconsciously analyze risk on the job: 'Should I shut down that noisy pump and impact production, or hope it holds together till the end of this production run?'
Whether we realize it or not, such risk assessments make use of some form of domain and likelihood matrix similar to the ones described above. Each establishes impact values for domain and likelihood, and plots results on a risk-ranking chart. (See, 'Sample-Risk ranking model' diagram.)
Conducting analysis during operational reviews provides time to discuss, analyze, quantify, and document the risk of various production-impacting scenarios. It also provides time to assess ways and means of reducing those risks, such as adding a second transmitter, installing a redundant fieldbus segment, and/or specifying a redundant controller.
It's during the relative quiet time of operational reviews that emotions and adrenaline are in-check and decisions are more rational.
This is one of the few times so much knowledge can be focused on reducing production impacts without incurring the pressures of having just missed the delivery date to a key customer. According to Dr. Summers, 'Often, further investigation leads to process improvements that offset study and engineering costs.'
Document design requirements
Once each unit's critical loops are identified, assigned impact values, and risk-reduction methods identified, the next action is to document each loop's performance criteria. Details should include what the loop is supposed to do and why; what the loop's quality targets are (i.e., accuracy, repeatability, response times, and reliability expectations), and what risk-mitigation decisions have been determined and why.
In short, this is the time to make 'tracks' so the next time operational reviews are conducted, it's easier to recall what, how, and why things are the way they are.
Documenting each critical loop provides a current perspective and helps future reviews improve on what's already been done. It also helps establish loop design criteria. For example, do loop performance targets require the accuracy, repeatability, and costs of a Coriolis meter or does another, less expensive, flow-measurement device meet documented performance requirements?
Too frequently 'as-built' documentation for a process facility doesn't reflect the 'approved for construction' documentation. The reasons are many and often justifiable, but when equipment substitutions are made, such as when someone approves replacing a globe valve with a butterfly valve to save a few hundred dollars, established production risk are unknowingly jeopardized.
Instrumentation and controls are generally about 5-7% of a projects total cost. Despite the project manager's ranting and raving, saving a few hundred dollars on instrumentation and controls isn't going to break the project's budget. However, shortsighted decisions can significantly impact lifecycle/maintenance costs further down the road.
Having completed and documented operational studies provides excellent grounds for justifying and insisting instrumentation and control is specified, purchased, and installed as designed.
Test and tune
One of the key intangible benefits to complete and document operational reviews is to know where to spend time in loop testing and tuning.
Most control engineers would like to test control solutions against robust models of the process. Unfortunately, there is seldom enough time, resources, and/or justification for modeling the entire process.
However, targeting resources to the critical units and loops helps focus what to model versus where to apply simple tieback simulations.
This approach permits tuning of critical loops against robust models, thus improving the likelihood critical loops provide decent control from the very start.
Production-impacting scenarios and the associated documentation produced during operational reviews become extremely valuable for training operators and maintenance personnel.
Often times, using simulations and scenario-based training makes it practical to witness that alarm assignments actually receive the required attention.
Graphic screen content and navigation can be validated during scenario-based training. Also, the information helps maintenance staff understand the importance of a 'noisy' pump, why a backup pump is hardpiped in, and why it's important to keep both pumps in tip-top working order.
In scenario-based training, everyone gains knowledge and is better armed to make informed decisions when minutes count. It's a natural extension of the popular asset-management story.
Most companies are required to comply with a variety of regulations. Most regulating agencies require documenting and approving changes affecting regulated portions of the process and/or control system. For example, having and enforcing change-management procedures is a key element of U.S. Food and Drug Administration, Occupational Safety & Health Administration, and Environmental Protection Agency regulations.
And even if change management is not mandatory, it still makes good sense to have formalized change management procedures in place to ensure everyone is aware of changes affecting production goals. Therefore implementing and abiding by change control procedures helps an organization achieve continuous improvement.
Figuring out when, where, and how much redundancy to apply to the instrumentation and control system to improve production uptime isn't rocket science, it's simply applying a methodology that follows good engineering practices.
Until you start doing what you already know is the right thing to do, the next time production objectives aren't met, you will find yourself wondering, 'Did we overlook something?'
Comments? E-mail firstname.lastname@example.org
Additional information on riskassessment analysis is available online at
Managing Risk: Don't Fall Flat, CE, Dec. '99 cover story, and Online Extra: Managing Risk Improves Production
Use Layer of Protection Analysis to Comply with Performance-based Standards, Web Exclusives, Feb. '00
Dual vs. Triple, CE, May '00
Do I Need a Safety Instrumented System, CE, Jan. '00; and Online Extra: Developing and Using a Risk Assessment Model.
Case Study Database
Get more exposure for your case study by uploading it to the Control Engineering case study database, where end-users can identify relevant solutions and explore what the experts are doing to effectively implement a variety of technology and productivity related projects.
These case studies provide examples of how knowledgeable solution providers have used technology, processes and people to create effective and successful implementations in real-world situations. Case studies can be completed by filling out a simple online form where you can outline the project title, abstract, and full story in 1500 words or less; upload photos, videos and a logo.
Click here to visit the Case Study Database and upload your case study.