Open Systems Reliability
The flexibility promises of open systems have been around for over 20 years, but with flexibility comes new responsibility. Understanding these responsibilities helps avoid support-related issues later.
Dr. William M. Goble, Exida.com -- Control Engineering, 10/1/2001
|
Downtime is the last thing operations people want their control systems to experience, especially where system utilization and/or finished goods value is high.
Good reliability and safety in process equipment and related controls is near the top of every control engineer's buyer's list, followed closely by budget constraints and a need to rapidly install and get the system operational.
Many control engineers consider "open systems" to be at least part of the answer to cost issues, but do open systems present a higher safety or availability risk than proprietary or closed systems?
Most people understand purchase price shouldn't be the sole criterion for selecting a system, but seldom is system reliability and availability factored into the decision making process to produce hard numbers of what's an acceptable amount of unplanned downtime. And fewer still produce hard numbers for associated costs of lost production, wasted energy and raw materials, safety and environmental consequences, customer impact, and so on.
To avoid later surprises, both open and proprietary control systems should be evaluated to ensure everyone understands the sources, possibilities, and consequences of unplanned downtime resultant from system failures.
Promise of open systemsIt has been more than 20 years since we first started hearing about "open systems." Early presentations showed industrial controllers of various manufacturers connected using standard networks with computer systems and human-machine interface (HMI) consoles. Each piece of equipment contained standardized connectors and communications protocol and onboard intelligence to handle device identification and data needs. When connected and powered, a new piece of equipment would automatically identify itself and begin sharing data with other connected equipment.
The promise was that control engineers would spend most of their time working on process improvements and very little time worrying about the system and its communication needs. The promise was a highly flexible control system environment, where "best of breed" products could be selected, deployed, and used. During these visionary presentations, nobody explained control engineers needed to understand operating system nuances, dynamic link libraries (DLLs), and how to make multiple communication protocols share the same wire.
Although earlier attempts were made, most agree the first big push toward open systems came with the General Motors MAP (Manufacturing Automation Protocol) effort of the 1980s. Untold hours were spent defining protocols followed by a massive education effort. A groundswell was initiated when operating companies sent letters to equipment manufacturers stating that in the future, supplier systems would only be considered if they supported MAP. The vision was about to become reality.
Unfortunately, the communications standard evolved into an "everything for everybody" protocol and became quite complex. In fact, the underlying software became so complex it often required twice the computing power and several times the memory of the controller software it was supposed to support, and even when that level of computational power was made available, data rates seldom exceeded 500 variables per second-about the same performance as a simple serial bus.
Complexity of the software introduced higher system failure rates and unexplained computer crashes. Some were traced to undefined (proprietary) message content, others remained untraceable and never explained. A complex communications mechanism just did not work, even in the name of "open."
By the early 1990s, once again it appeared the goal of "open" would be met-at least that's what the advertising wars implied.
Most major DCS (distributed control system) and PLC (programmable logic controller) vendors were touting open as the main system attribute. While the offering was far from the ideal vision, control equipment vendors actually began to deliver on some of the promises. Using Ethernet technology and the defacto TCP/IP standard, third-party operator stations actually connected with some DCS and PLCs. But again, strange, nonrepeatable, and unexpected system failures appeared, oftentimes long after the initial startup. When these sorts of problems appeared; which vendor was responsible? Finger pointing became blatant and control engineers began to realize they did need to learn about operating system nuances, DLLs, and how to make multiple communication protocols share the same wire. In a few instances, the vision became a nightmare.
Approaches to open systemsEven today the entire open system vision remains elusive with two fundamentally different approaches being pursued by different groups.
One approach involves the concept of "open source" and adoption of defacto standards from the personal computer (PC) environment using either the Microsoft or the open source Linux operating systems. This leverages lower-cost or free software constructed on Microsoft or Linux foundations with commodity PCs to deliver the ultimate in low-purchase-cost solutions.
The second approach is the fieldbus approach, where boundaries are established around a piece of equipment or an operational unit and a standardized communication protocol is established within the boundary-much the same as the original MAP concept.
Despite years of fieldbus wars (and skirmishes still exist) as various factions pushed their version of a protocol, two versions are gaining widening acceptance in the marketplace-Profibus and FOUNDATION fieldbus. Both have a wide range of capabilities including data transfer, programming support, and network management. And both have a wide and growing range of certified available products.
While methods vary for each organization, certification testing helps users confidently select and install different manufacturers' products on the same fieldbus network, reducing the uncertainty and finger pointing of previous open-solution attempts.
Open system reliability and safetyOpen system offerings are starting to show promise. But is such a system reliable and safe enough? Reliability is defined as "the probability of successful completion of intended functions during an interval of time" and all sources of hardware and software failures count.
When considering all possibilities for failure in an open system, it is no wonder there have been transient unexplained problems. From reliability and safety perspective, the world of open systems is nothing like the world of proprietary DCS/PLC systems. Compared to the proprietary world, things are more flexible, more complex, and relatively uncontrolled, at least in terms of ensuring all the parts and pieces work in harmony (a goal of certification).
System failures typically occur when some unanticipated input enters the system. The rouge input can be thought of as "stress" on a software system. The software's ability to respond to that stress without failure represents its "strength." (See "What makes software reliable?" sidebar.) For software to be strong, the developers must spend a lot of time thinking about all the things that can go wrong and then do something to prevent the wrong from happening. This is difficult since most software developers have enough trouble thinking about getting things working with expected input conditions. In many cases, unanticipated input conditions cause unanticipated output responses. Things happen like writing to the wrong place in memory or starting a chain of events that can crash a computer, wipe out memory, or even transmit a wrong output.
In the single-vendor proprietary system, communication protocols were designed by one team, perhaps even one person. Specific messages needed for interaction between an operator station and a controller were clearly defined. Even though it was unlikely that a bad message would be sent, error-checking routines knew what the incoming messages would be and rejected everything else. All software from device driver to message handler was written and understood by one design team working from a single functional specification. In such an environment, it is relatively simple to build "strong" software.
Even so, many software failures have occurred, but when that happens, it's clear who's responsible, and the failures become the responsibility of the developers. In the proprietary DCS/PLC environment, one design team has clear responsibility for reliable and safe system operation. This is neither probable nor possible in an open-system environment.
In the open environment, it is more likely that unexpected data can be communicated. General-purpose communication protocols have become more complex, making it harder for software developers to filter unexpected data messages. It is unlikely that one design team even understands the whole design, and it's very murky who is responsible when a failure occurs. As a result, systems get rebooted, failure events go unreported, and control and automation systems become less reliable.
Moving into an open-source PC environment, there is more flexibility and more opportunity for trouble. In this less-controlled environment, dynamic linking library's developed by different companies are often incompatible and installation procedures aren't always elegant enough to prevent one overwriting another. (See "How Linux addresses open systems" and "DLL hell finally exercised" sidebars.)
Versions of the operating system written for different countries do not always react identically. What happens when the input "1.000," (number one with four significant figures) in the U.S. version of the operating system is interpreted as 1,000 by the European version? What happens when language dependency is not accounted for in the software design?
While many groups are working diligently to improve the quality of the software they produce, independent software quality audits have shown that many organizations still produce software at a level called "chaos." Audits have revealed that software safety and reliability techniques are often not understood by software developers, and while many developers are good at what they do, variability is considerable. That alone introduces risk of control system failure.
Abandon open systems?Is the risk of unsafe operation or downtime with open systems too great? With all this potential trouble, shouldn't we abandon the idea of open systems? Of course not; the benefits will someday be spectacular. But until then, tread very carefully. Understand where such equipment can be safely used. Understand the cost of downtime and consider the entire life-cycle costs. Consider getting the details of each vendor's software quality, safety, and reliability procedures, and if that information is unavailable, buyers should beware.
Avoid complexity by not allowing general-purpose use of personnel computers where control is performed. Do not install anything not required, no games, no office tools, no flight simulators. Perform system testing only after all hardware and software is installed. When upgrades or new hardware or software is installed, retest everything.
The promise of open systems will someday arrive, and the promise can be achieved without sacrificing safety or reliability. Eventually more and more developers will conduct system and software hazard analysis to identify potential problems. System developers and integrators will conduct failure modes and effects analysis on software and system designs. Standard interfaces will include message filtering and error checking to the level sufficient for process control.
Yes, the day is approaching when control engineers will spend their day improving the manufacturing process. Until that day arrives, ask questions and apply common sense in where and how you deploy open systems.
For more information circle 200, on line, at www.controleng.com/freeinfo or visit www.exida.com. For software suppliers, go to www.controleng.com/buyersguide.
| Author Information |
| Dr. William Goble is co-founder and president of Exida.com. |
|



















View All Blogs



