If one's good, is two better, and three the best?
I ntuitively, most people think ‘If one’s good, two should be better, and three must be the best.’ Unfortunately, in terms of safety system performance, things are not as intuitively obvious as they may seem. When properly designed, dual systems can have higher on-line availability and higher safety performance than triplicated systems.
Failure modes
Safety systems can fail in two ways. First, they may suffer, or initiate , a nuisance trip (and shut the plant down when nothing is actually wrong). For example, a circuit designed so relays are energized and contacts are closed will fail with contacts open (causing a shutdown). Some people call these ‘safe’ failures. Most PLC (programmable logic controllers) and DCS (distributed control system) vendors use the term ‘availability’ for this measure of performance. Unfortunately, this causes some confusion, since this is not the same as the term ‘safety availability’ used throughout safety standards. Better terms would be ‘nuisance trip rate’ or ‘mean time between nuisance trips,’ measured in years.
Safety related systems also may suffer from fail to function (inhibiting) and fail to respond to actual shutdown demand failures. An example would be energized relays with closed contacts that ‘weld’ shut and cannot open to cause a shutdown. Industry calls these ‘dangerous’ failures and the measurement of performance terms are ‘safety availability’, ‘probability of failure on demand,’ and ‘risk reduction factor.’
Architecture comparisons
Referring to the ‘real impact of redundancy’ diagram, all example systems use energized relays with normally closed contacts, the 1oo1 (one-out-of-one) example shows that a safe failure occurs anytime the relay is de-energized and the relay contacts open causing a safety trip. Assuming a failure probability in this mode of 0.04, means that during a defined time period (e.g., 1 year) the system has a 4% probability of suffering a nuisance trip. Another way to view it is, out of 100 systems, four will cause a nuisance trip each year. (The numbers are only for comparison purposes. Real failure probability numbers are established from experience and hardware testing.)
In the 1oo1 example, a dangerous failure occurs when/if the contacts are welded shut. Assuming welded contacts represent a failure probability of 0.02, during a defined time period (e.g., 1 year) the system has a 2% probability of not operating properly when there is a demand (i.e., hazardous condition). Another way to view it is, out of 100 systems two are unable to respond when needed each year. Swapping the numbers around doesn’t matter; the point is to illustrate the impact redundancy provides.
Utilizing a dual 1oo2 (one-out-of-two) system with two normally closed outputs wired in series, means a safety shutdown occurs if either relay output opens. The downside of 1oo2 systems is there is twice as much hardware involved, thus twice as many nuisance trips can occur during the defined period, causing the 0.04 failure probability to double to 0.08.
In the dangerous mode (contacts welded), the 1oo2 system fails to function only when both channels fail simultaneously . (If one were welded closed, the other would de-energize and cause a shut down.) What’s the probability of two simultaneous failures? Actually, it’s rather simple. What’s the probability of tossing a coin and having it come up heads? It’s 50%. What’s the probability of two coins landing heads? It’s 25%, or the probability of one, squared (0.5 x 0.5 = 0.25). So the probability of two channels failing at the same time is remote (0.02 x 0.02 = 0.0004).
The 1oo2 system is very safe (the probability of dangerous failures is very small), but the system suffers twice as many nuisance trips as simplex (which is not desirable from a lost production viewpoint).
A dual 2oo2 (two-out-of-two) system places two sets of normally closed relay contacts in parallel, requiring both contacts to open in order to perform a shutdown. A dangerous failure (welded contracts) would prevent a 2oo2 system from initiating a shutdown. Since the 2oo2 system has twice as much hardware as a simplex system, it has twice as many dangerous failures, therefore the 0.02 doubles to 0.04.
For this system to have a nuisance trip, both channels would have to suffer safe failures. As before, the probability of two simultaneous failures is the probability of one, squared. Therefore, nuisance trip failures in the 2oo2 system are unlikely (0.04 x 0.04 = 0.0016) but the system is less safe than simplex. This does not imply 2oo2 systems should never be designed. If the probability of failure on demand (PFD) value meets the overall performance requirements, the 2oo2 design is acceptable.
A triplicated 2oo3 (two-out-of-three) system is a majority voting system. However two or more channels ‘vote,’ that’s what the system does. The math used to derive voting in 2oo3 systems appears in ISA TR84 technical report, IEC 61508 draft standard, and a book written by D.J. Smith, titled Reliability, Maintainability and Risk, Practical Methods for Engineers , published by Butterworth-Heinemann, ISBN 0-7506-0854-4, 1993.
Most people are surprised to learn that a 2oo3 system has a higher nuisance trip rate than a 2oo2 system, and a greater probability of incurring a dangerous failure than a 1oo2 system. But the revelation comes after answering these questions:
How many simultaneous failures does a 1oo2 system need to have a dangerous failure? Answer: two.
How many simultaneous failures does a 2oo3 system need to have a dangerous failure? Answer: two.
How many simultaneous failures does a 2oo2 system need to suffer a nuisance trip? Answer: two.
How many simultaneous failures does a 2oo3 system need to suffer a nuisance trip? Answer: two.
There lies the revelation, triplicated (2003) systems have the most hardware, hence more failure combinations; (A+B, A+C, B+C). Triplicated systems are actually a tradeoff. Overall, they`re pretty good, but not as good as 1oo2 and 2oo2 systems.
Closer look at dual systems
A closer examination of the numbers included in the Simplex system diagram indicates the 1oo2 system is safer than 2oo3, and the 2oo2 system offers a better nuisance trip rate than 2oo3. It has been known for years that if a dual system could be designed to provide the best performance of both of the dual systems, such a system would outperform a triplicated system. But, it has been within the last 10 years that technology has reached the point where such a design can be properly implemented. These systems are commonly referred to as 1oo2D.
Dual is not always better than single, and triple is not always better than dual. When properly designed, dual redundant systems can offer equal or higher on-line availability and safety than triplicated systems. One thing users have working in their favor when evaluating safety instrumented systems is independent third-party certifications from organizations like the Automatic Software Information (ASI) Technology department of TÃœV Rheinland (Cologne, Germany).
For example, among the standards used by ASI to certify programmable electronic systems (PES) are DIN V 19250 ‘Fundamental safety aspects for measurements and control equipment,’ and IEC 61508 ‘Functional safety of electrical/electronic/programmable electronic systems (PES).’ Just one caution, when reviewing ASI’s certifications, be sure to get familiar with the ranking methods. Rankings once proceeded by AK (application class) are now proceeded by RC (requirements class), and RC values and SIL values are not directly correlated.
For more information about TÃœV/ASI safety system certifications, visit www.isep.de .
Comments? Email [email protected]