Network redundancy reduces risk, downtime
What happens if a failure occurs within the networked automation and control infrastructure, resulting in a sudden and unexpected production stoppage? How much product would be lost? How much productivity? For most companies, the time, effort, and costs associated with recovery, repairs, and restarting the line after a sudden outage are significant—and, in some businesses, the costs can be astronomical. Whether a plant is involved in discrete or process operations, ensuring that production runs smoothly and uninterrupted is critical to the bottom line.
Network redundancy is like an insurance policy for industrial networks. Acting as a quick-response backup system, the goal of network redundancy is to mitigate the risk of unplanned outages and ensure continuity of operation by instantly responding to and reducing the effects of a point of failure anywhere along the critical data path. When you consider the direct and indirect costs of unplanned downtime, it becomes clear that making the investment in network redundancy is a smart strategy.
Who needs redundancy?
Any network that inherently requires high availability or is involved in mission-critical operations will benefit from data path redundancy. Certainly the use of industrial-grade network components, such as ruggedized switches and hardened or armored cables, alleviates the potential for damage or breakage of parts. However, if a switch should fail or a cable break, a redundant system ensures continuity and avoids disruption of critical communication and data flow.
Network redundancy works by creating multiple data paths within a network, between any and all locations. If a cable, switch, or router suddenly fails, another pathway will be available to maintain the communication flow. Redundant systems deliver significant value in a host of industrial applications and are especially essential to:
- Process industries operating 24/7 such as metals, pulp and paper, water/wastewater
- Food and beverage and pharmaceutical plants and some manufacturing in which regulations demand constant data monitoring and precise process control recordkeeping
- Industries where one process depends on the output of another and production stoppage directly impacts downstream and sometimes upstream operations
- Any plant where an interruption or outage may lead to significant product damage, scrap, spoilage, or waste
Overview of redundancy protocols
Implementing a network redundancy strategy will depend on many factors, largely dictated by the application and the existing network topology—the physical layout, location of systems, processes and devices, and the way the cabling infrastructure is run. Certain redundancy methodologies are more suited for one system configuration than another.
Redundancy protocols may be standards-based or proprietary. In general, standards-based redundancy protocols provide outstanding interoperability but slower recovery times, whereas proprietary protocols in most cases offer faster recovery speeds and are designed specifically for industrial recovery applications. There is some overlap in the features and functionality of redundancy protocols, and in many applications, the use of hybrid protocols (a mix of methodologies) is quite common.
Two of the most popular standards-based redundancy protocols include:
- Rapid spanning tree protocol (RSTP) This uses an algorithm methodology to determine which paths are used for the primary communication, which are redundant, and which are the most reliable. This option is best suited for complex mesh network topologies that have multiple redundant links, but it can also be used in a ring topology. However, RSTP is not largely scalable and recovery speed is one second or more, which may be far too slow for some automated industrial processes.
- Link aggregation is not truly a methodology but provides a way to group multiple links into one virtual link. Consider a situation in which a number of links—up to eight—have been established between two locations. If eight 100-Megabit connections are in place, link aggregation will make them one 800-Megabit connection. In this case, if one link fails or breaks, the system drops back to 700 Megabits but remains intact. Link aggregation is often used in conjunction with a RSTP methodology but, like RSTP, recovery speed is one second or more.
Commonly used proprietary redundancy protocols include:
- Ring protocols offer high availability, reliability, and predictability. Because they are ring protocols, they will not work with mesh topologies. However, they are highly scalable—ring protocols have been tested in single rings of up to 200 switches and recovery times were fast and consistent. Leading vendors of proprietary ring protocols are now offering recovery times of 300 ms or less, with newer versions recovering at 10 ms or less.
- Dual-homing or redundant coupling is another variation, which typically offers recovery time speeds in the 200 ms range. This approach can be used to give redundancy to or connect a ring topology—either proprietary or standards-based—to enable redundant links between the ring, or between other lower level networks and a higher level network.
Networks employing redundancy must be equipped with managed switches as opposed to unmanaged switches. This is because unmanaged switches are basically plug-and-play devices without built-in intelligence. Managed switches, however, are intelligent devices that provide visibility into the network.
Many managed switches incorporate a link-loss-learn capability that simplifies and speeds recovery from a link fault or failure. If a link breaks, the managed switch immediately recognizes the break. Internal mechanisms prompt it to automatically flush its MAC (media access control) address table, alert the other switches within the redundant topology of the change, and then force the other managed switches to do likewise, reducing recovery time to the sub-second level.
Selecting the right protocol and configuration of redundancy protection requires careful evaluation of the application requirements and a review of available redundancy options.
Three elements need to be factored into the decision making process:
- Required protocol speed. How fast a recovery time does the application demand?
- Physical layout. How will the cabling be configured? Does the system cover a large geographic area or a more compact one? A mesh network, for example, uses more cabling than a ring configuration. And, with the addition of each redundant switch and connection, an associated cable must be installed. Here, especially, is where cost factors enter into the equation.
- Probability of failure. Some applications and infrastructure environments pose a greater risk than others. In mining, for example, buried cabling might be at risk of breakage. Such circumstances may indicate a need for more than one redundant path and suggest selection of a rapid spanning tree solution.
Let’s look at two examples of redundancy in different plant applications. In a discrete manufacturing plant, the network design is configured into smaller segments in which some areas have more need for redundancy than others and can be configured accordingly. An automated painting area involving a lot of motion, for example, may require a faster and more complex redundancy protocol to prevent moving parts from colliding, impacting other components or injuring workers. Conversely, a less critical area where damage or breakage from an outage is unlikely can rely on a single-point redundancy protocol.
Weigh risk vs. value
Clearly, since network redundancy requires an additional investment, each industrial company needs to evaluate its specific application and network infrastructure, and then weigh the potential risks versus the benefits.
In general, redundant networks offer significant value: they need little or no maintenance, they are self-healing systems and, thanks to fast recovery times, they can save money over the long term—even if a relatively small number of breaks occur to disrupt communication flow. Even with redundancy, breaks still need to be repaired. However, the indirect costs of production outages—which more than likely are far higher than the repairs—can be averted. This is the core of the network redundancy value proposition.
As with every effective insurance policy, the goal is to reduce the risk and mitigate its effects, both operationally and financially. It comes down to simple mathematics: the cost of downtime versus cost of a redundant system. Many industrial firms have already determined that the initial investment is a small price to pay for the peace of mind the insurance of redundancy can bring.
Mike Miclot is vice president of marketing for Belden Industrial Solutions Division. John Mower is industrial Ethernet technical support engineer for Hirschmann, a Belden brand. Belden designs and manufactures signal transmission solutions for enterprise, industrial, wireless networking, and specialty markets.