Applying network best practices via Ethernet network redundancy
Implementing redundancy in an industrial network and choosing an appropriate application method can ensure continuing reliability and operational success.
Among the most relevant—yet overlooked—issues to address when planning an industrial Ethernet network is the use of an appropriate redundancy mechanism. Consider that a communication failure in a production network can create costly downtime, cause the loss of important company data, or even initiate conditions for serious damages to production equipment—or worse: injury to personnel. A redundant physical layer network structure protects against production downtime by ensuring the availability of communication continues, even if errors occur.
The following questions should be asked and meticulously discussed. Is redundancy required? If so, which redundancy method is best for the application? Often, network planners and end users consider the redundancy option extremely costly or a technology that's too daunting to deploy. A decision to have redundant connectivity in the network does necessitate the higher cost of managed switches, as well as the added time and effort of primary configuration and continuing management. However, the additional equipment cost and resources can be far offset by continuous system uptime experienced, superior network monitoring/management capabilities, and reduced troubleshooting measures through extended diagnostics (see Figure 1).
Network redundancy involves the integration of hardware with software. This ensures that the availability of the network remains optimum in the event of a single point of failure. The communication system—the industrial network—is the core of every modern automation project. To handle network errors, a protocol can be selected from various options and integrated into the infrastructure elements. Redundancy methods can be categorized into four groups: IEEE open source, proprietary sub-second, standardized high-availability network redundancy protocol, and zero-interruption (bumpless) redundancy (see "Redundancy methods and examples"). Characteristics of these categories are distinguishable among the many redundancy methods and are particularly suited to certain applications and requirements (see "Table: Redundancy methods and choices below").
During the planning stage, if it is determined that the industrial network must be resilient and able to automatically recover in the event of a Layer 1 failure, a redundancy mechanism absolutely must be employed. After that determination is made, attention must turn to the selection of an appropriate redundancy mechanism. If the process that the network supports can withstand up to a couple of seconds of delay while a network re-convergence takes place, then tried and true rapid spanning tree protocol (RSTP) can be used, and there is no need to look further.
Conversely, when the topology is less tolerant to an extended outage and a communication gap of seconds may cause system alarms or input/output (I/O) faults, then a high-speed mechanism should be deployed. Often, these are proprietary, but can provide the user with sub-second recovery times, superior to IEEE standards-based redundancy, and can accommodate hundreds of switches in single or multiple rings.
Spanning tree protocol (STP)
Ethernet networks with redundant data paths will form a meshed topology with impermissible loops. Due to these loops, data packets can circulate endlessly within the network and also can be duplicated. STP is an open protocol that is described in IEEE 802.1D-2004: IEEE Standard for Local and Metropolitan Area Networks—Media access control (MAC) Bridges. It is an Open Systems Interconnection (OSI) Layer 2 protocol that guarantees a closed, loop-free local area network (LAN). It is based on an algorithm developed by renowned software designer and network engineer Radia Perlman while she was employed at Digital Equipment Corp. STP made it possible to extend the network whereby redundant links are integrated. In this way, an automated backup path was provided in the event an active link dropped out for whatever reason without creating closed loops in the network.
To apply this protocol and gain the maximum benefit, as with the other redundancy methods, the used switches must support the protocol. After the interruption of a segment, it can easily take 30 to 50 seconds before the alternative path becomes available. This timer-based delay is unacceptable for controls, and 30 seconds is extremely long for any monitoring application. Generally, the standard STP delay in executing a recovery is too long to be acceptable in an industrial application. Unfortunately, the strengths of the STP are what make it inherently not unsuitable for redundant ring structures.
The complexity that allows STP to support a variety of topologies limits its performance in a relatively simple redundant ring. Thus, when a fault occurs in a ring, the obvious solution is to treat the interrupted ring as two separate network segments until the link layer break is remedied. Given that there is only one fault recovery solution in a ring, the typical time taken by standard STP to collect fault data and process the messaging to create an analysis of that fault is likely unacceptable.
To deal with the shortcomings of STP, IEEE established RSTP in 2001. RSTP is a standardized, open redundancy method (IEEE 802.1D-2004) supported by a vast range of managed switches regardless of their manufacturer. The protocol supports ring and tree topologies, as well as meshed networks, and easily can be enabled in any managed network. The protocol initially was described in IEEE 802.1w-2001: Rapid Reconfiguration of Spanning Tree. Then, in a 2004 revision of the standard, the original STP was noted as superfluous in the IEEE 802.1d standard and recommended the use of RSTP instead of the original STP whenever possible. IEEE 802.1w is therefore included in the 802.1d standard.
The network tree structure is calculated by the RSTP algorithm so there is one switch configured as the root (see Figure 2). Different redundant physical connections can be created within the network. Without the presence of a redundancy mechanism, this would result in the occurrence of unacceptable loops that would quickly congest the entire network, which would create failures. RSTP converts this topology into a tree structure, albeit inverted, by closing off a number of ports whose paths are deemed as lesser by the algorithm. This creates the necessary, logical, loop-free environment. With an infrastructure device configured as the network root and logical blocks created from that root, all other switches can be reached via one path. If a network error does occur, such as a broken or disconnected cable, then a new active path is automatically created.
The recovery times experienced with RSTP are significantly lower than those of the original spanning tree (hence the name) and are specifically 1 second to a few seconds, instead of the 30- to 50-second times of the original iteration. Depending on the application, the recovery time of RSTP already may be fast enough to ensure dependability.
RSTP has had quite a lengthy tenure as the redundancy of choice in many IT and industrial network installations. Although faster redundancy schemes have been developed during the ensuing years, today, RSTP actually remains quite a viable choice for the average application, especially where device cost may be a concern.