Ensuring industrial Ethernet performance, reliability
During the past 10 years, Internet and networked devices have proliferated beyond what anyone could have imagined, and there is every indication that growth will continue to be strong in the coming years (see Figure 1). Bloomberg reports that Internet traffic is expected to grow 42% in 2012 and should reach 1.3 zettabyes, or 1.3 trillion gigabytes, by 2016. While this growth has primarily been fueled by consumer and business use, it was inevitable that the automation and control industry would take part in this transformation.
The early challenges of adapting Ethernet for use in industrial networks have been addressed with improvements in technology and reliability, leading to strong continued growth in the use of industrial Ethernet. This article explores the requirements of building a network that’s reliable enough for the stringent requirements in automation and control and serves as a guide to technologies that are available today as well as to best practices for ensuring top reliability and performance in industrial Ethernet networks.
Defining Ethernet performance
Broadly speaking, the three major performance areas that matter most to automation engineers when working with industrial Ethernet are reliability, bandwidth, and determinism. Reliability refers to network and device uptime and is generally the most critical performance element for automation and control networks. The need for high reliability is also the primary reason that the automation and control space historically lags behind the consumer space when it comes to adopting new technology. Business users can generally withstand momentary failures in e-mail or network communication, and they see more immediate benefits when adopting the latest, time-saving technology. The situation is different for industrial users, where interruptions in operation or communication can result in immediate and substantial losses. Industrial users are therefore unlikely to adopt new communication technology until it can guarantee exceptionally stable and reliable performance without interruption under actual operating conditions.
Another measure of performance is bandwidth, which refers to the volume of network traffic that can be supported. Bandwidth is becoming an important issue for many industrial users and is driven by several trends: the growing number of devices connecting to the network, the growing use of wireless and cellular technology, and the growing role of IP video in monitoring and surveillance. Although there are still many applications where 10/100 Mbps is more than enough bandwidth to handle the expected network load, Gigabit and even 10GbE networks are fast becoming a standard requirement for industrial network planning.
Finally, for many automation and control applications, it is important for industrial Ethernet networks to achieve a level of determinism, where data can be reliably delivered within a predictable time frame. In the past, this was an inherently problematic requirement for Ethernet, which by design allows for great variability in packet delivery time. However, for applications that require highly precise timing and coordination, such as machine control, uncertainty in packet delivery time cannot be tolerated. Fortunately, developments in Ethernet technology and standards have made it possible to achieve the level of determinism required by many industrial users.
Hardware reliability, equipment failure
The task of ensuring Ethernet network uptime for industrial users starts with hardware ruggedness. The environmental conditions faced by Ethernet equipment in the automation control space are often much harsher than the conditions faced in commercial and business settings. Ethernet equipment installed in outdoor cabinets or on the factory floor may have to contend with elevated levels of vibration, electromagnetic interference, heat, and airborne particulates. If the equipment is not properly hardened or protected, the Ethernet network may suffer from frequent equipment failures and unreliable data transmission.
Forced-air cooling is a method that manufacturers commonly use to prevent equipment from overheating. Fans drive air circulation within the networking equipment to draw in cooler air and drive out hot air (see Figure 2). This is an adequate measure when the equipment is kept within a clean and temperature-controlled room, such as in the IT closets found in commercial facilities. However, the commercial-grade approach tends to be inadequate for many automation control applications, where an environmentally controlled room is simply not available or feasible. The prevalence of environmental particulates at many industrial sites can negatively affect forced-air cooling efforts as particulate matter accumulates in the air filters or within the device itself. In addition, the fans themselves are more prone to failure than other components, so regular maintenance and downtime must be factored into the cost and planning of the network.
For this reason, fanless operation has become a compelling feature to ensure high hardware reliability in industrial Ethernet equipment. Since there are no moving parts, passively cooled hardware can achieve much higher mean time between failures and stands up well to industrial use. The amount of engineering required to develop a reliable fanless Ethernet switch means that the initial purchase cost will be significantly higher than for commercial-grade fan-cooled switches. However, over the lifetime of the device, fanless switches often show a superior return on investment when factoring in reduced maintenance, network downtime, and equipment replacement costs.
Power, media redundancy
Power and media redundancy can help minimize the occurrence and impact of network downtime. Industrial networks are often deployed in the vicinity of rotating machinery such as inductive motors or generators, welders, or other high-power machinery. This exposes the Ethernet equipment to fluctuations in power quality that aren’t typically seen in commercial applications. For this and other reasons, basic control-system redundancy requires every part of the communication network to have a redundant backup power supply in case of a power interruption or outage. The backup power supply takes over when the electricity fails, minimizing the possibility of damage or loss of critical data caused by the system shutting down. To meet the needs of automation and control systems, the hardware should be compatible with unregulated dc and have reverse power protection and isolation of the redundant power inputs.
Media redundancy is also a common requirement for automation networks and involves establishing a backup communication path when part of the network becomes unavailable. Because redundant paths on Ethernet networks create network loops, a method or protocol must be used to block this redundant path during normal operation. IEEE 802.1D Spanning Tree Protocol (STP) was developed in the IT space for networks to deal with redundant paths. With IEEE 802.1D, one switch on the network is designated the “root switch” of the network, and automatically blocks packets from traveling through any of the network’s redundant paths. In the event that one of the paths in the network is disconnected from the rest of the network, STP automatically readjusts the network to use the redundant path.
The major limitation of IEEE 802.1D STP is the high performance cost associated with each network convergence, which refers to the process by which participating switches agree on the root bridge and the ports/network paths to block. When a network segment fails, the STP network will re-converge, which suspends all network traffic except for the control messages used in the convergence operation itself. This suspension of service can last up to 50 sec, which could result in substantial and unacceptable losses in the world of industrial automation.
IEEE 802.1W Rapid Spanning Tree Protocol (RSTP) was introduced to overcome the limitations of IEEE 802.1D and boasts greatly improved performance, substantially reduced convergence time (under 5 sec), and correct behavior for mis-ordering and duplication in RSTP bridges. The improvement in convergence time is achieved by reducing the number of port states from five to three and by allowing ports to be specified as “edge” devices that can power up or down without requiring the network to reconverge (because they are not passing frames to another switch). These enhancements make it possible to achieve media redundancy with higher performance, although some of the advantages are lost when RSTP is interoperating with STP.
Although RSTP is a substantial improvement over STP, many control applications require even better network recovery times. A number of Ethernet device manufactures have developed proprietary redundancy protocols based on 802.1W and have been able to achieve near-instant network recovery times. Proprietary protocols are available that enable a 250-switch network to recover from a failed network segment within 20 msec. It is important to note that network recovery time can be susceptible to the network load and number of switches that are connected, so real-world testing and verification is an additional and important step to ensure the desired network performance.
In addition to protocol-oriented redundancy such as rings and chains, a number of hardware measures have been developed that help minimize downtime for industrial networks. Relay bypass ports can be employed as a way to localize network outages in linear topologies. In such linear topologies, the failure of a single switch could potentially take a major portion of the network offline (see Figure 3). Relay bypass ports forward traffic between adjacent switches even in the event of a power failure, ensuring that as much of the network remains online as possible. Hot-swap Ethernet modules are another method that can help minimize network downtime and localize outages. In the event of a module failure, a replacement can be swapped in without powering down the entire switch.
Managed devices, networks
For industrial automation, strong hardware with strong redundant and backup mechanisms is the fundamental starting point for ensuring Ethernet performance. The next major factor then becomes management of the Ethernet traffic itself. Simple unmanaged switches can be used to segment networks and eliminate network collisions, but managed switches offer far more flexibility and power to influence performance of the whole network. For example, managed switches make it possible to implement redundant topologies, quality of service (QoS), IGMP snooping, data logs, and other functions that ensure efficient, secure, and reliable handling of network traffic.
A managed network infrastructure also enables administrators to anticipate and deal with unpredictable and potentially damaging events. Users, out of ignorance or maliciousness, may connect an unauthorized device to the network that severely affects performance, such as a computer with a virus. For example, if a user added a router to the network that is using dynamic routing, it would be possible to route all traffic on the network to that router, creating congestion and allowing for a man-in-the-middle attack. Other unpredictable events that can adversely impact the network include sudden device failure and accidental cable severing or disconnection.
Industrial network planners also need to ensure that traffic traveling between different networks is properly managed. In the past, this could only be handled by dedicated routers, with a corresponding trade-off in network performance due to the processing overhead. One of the developments in the industrial Ethernet space has been the use of high-performance Layer 3 switches, which can handle essential routing functions with far higher speed and flexibility than traditional routers while offering many more options for Ethernet connections (see Figure 4). Manufacturers have also worked at coaxing better performance from traditional industrial routers themselves. Dual core 64-bit processors running at 500 MHz, for example, are being used successfully to achieve 500 Mbps throughput on dedicated security routers.
Ensuring network performance and security requires an ongoing understanding of actual network traffic and behavior. Reviewing and analyzing network data logs is the most basic way to achieve this, and syslog servers can be used to upload the logs for every switch on the network to a single location for easy access. Simple Network Management Protocol (SNMP) is also widely used to monitor and manage network devices and can be easily integrated into dedicated network management software (see Figure 5). This makes it easy for users to obtain live data for any network device using SNMP, such as current traffic levels on any link and which devices are up and down. In addition, automatic alerts can be sent by email or SMS if an alarm trigger is met (for example, if a port goes down or device goes down). This provides network administrators with both the knowledge and tools to address performance or security issues on the network, both preemptively and as they happen.
A number of other developments in Ethernet networking have proven useful for driving better performance for industrial users. Network management software is now available with a built-in OPC server to make it easy to integrate into a SCADA system. Digital input and output connections are being built into industrial managed switches, allowing for easy integration with a PLC. Industrial managed switches are also including support for field protocols such as Modbus, EtherNet/IP, and PROFINET, so a SCADA server can communicate directly with them in the field. Each vendor offers a different degree of support for the various capabilities and protocols, but basics such as port status are usually supported. These and other features enable closer interoperation between Ethernet devices and industrial systems, and make it easier to tailor network management efforts to the specific and immediate needs of the application.
Proper network engineering means building for the given application and assessing the level and type of performance that the network needs to exhibit. Jitter—not latency—is the enemy of a deterministic network. If latency is predictable, then it can be accounted for and the network can still be deterministic. With proper bandwidth management, Ethernet networks can achieve the determinism that is often required for industrial networking applications.
Over-provisioning is one of the easiest and most cost-effective methodologies used to achieve the desired quality parameters. This means that the network is designed with enough surplus bandwidth to prevent network congestion. Besides simply upgrading to Gigabit and 10Gb switches, bandwidth can be maximized by using faster interconnection. Switch-to-switch links on 1 Gbps or 10 Gbps ports will reduce latency and provide higher throughput. Link Aggregation Control Protocol is also an effective method for scaling up bandwidth by grouping physical links so they act as a single link with increased throughput. The downside is that, as network demand increases, so must the network bandwidth, and thus equipment may have to be upgraded on a regular basis.
In the commercial space, the 70/30 rule is used to provision network capacity, where a network is designed so that 70% of the available bandwidth is able to accommodate all of the known traffic loads. The remaining 30% is reserved for unknown traffic that is not accounted for and provides a safety margin. In the control and automation world, the required margin for safety and future growth is much greater than what is seen in commercial applications. It is very typical to see 10% rules used in these environments, sufficiently over-provisioning the network to ensure deterministic behavior and allow for future growth. It is not uncommon for networks with even higher safety margins to be used, where only 5% of the network is provisioned.
A deterministic network can also be achieved by using specialized Ethernet-based fieldbus protocols such as EtherCAT or PROFINET. The IEEE 1588v2 Precision Time Protocol can also be used to achieve highly precise synchronization in the sub-msec range across the entire network. These technologies have proven effective at meeting specific application or system requirements for highly deterministic behavior.
Quality of service
QoS is not a single protocol or methodology. QoS involves the categorical evaluation of network traffic in order to ensure appropriate performance during network congestion conditions. Ethernet switches and routers handle network data in a first-in first-out fashion, where data is transmitted in the order received. If there are more incoming data than the outgoing bandwidth can support, then the network device is forced to buffer the data and network congestion occurs. QoS manages this network congestion, so that the highest priority traffic streams are transmitted first or have reserved bandwidth.
With flow-based QoS, streams of packets or frames with the same quality requirements are grouped into a flow. Unfortunately, flow-based QoS never achieved widespread adoption due to high overhead requirements and inherent limitations in scalability. Class-based quality takes a different approach than flow-based quality and has become the QoS method of choice in many sectors. With class-based quality, packets with similar quality requirements are clustered together. Each cluster is prioritized differently and receives a different level of access to network resources. Higher priority traffic will be placed into a queue that will more readily send out traffic. Lower priority traffic will be placed into a queue where it will be forced to wait for the higher priority traffic.
Class-based quality requires the marking of network traffic so that the network device can define and execute prioritization rules on that traffic. Differential services, type of service, and class of service are the most widely used protocols that use class-based quality. Each of these protocols uses markings or tags to determine priority and can be used to fulfill specific QoS requirements.
Many devices and protocols in industrial Ethernet applications make use of multicasting, which allows one device or protocol to reach a number of end devices or applications with a single data stream. This can be a much more efficient use of bandwidth than creating individual unicast data streams for every receiving device. Multicasting is achieved by using Internet Group Multicast Protocol (IGMP), a Layer 3 protocol that depends on IP. In a Layer 2 switch, all ports will forward the multicast traffic regardless of whether the device on the receiving end wants it or not.
The heavy use of multicast traffic in industrial control systems, and by PLCs in particular, can result in high network congestion. Multicast snooping was developed as a solution to this. With multicast snooping, Layer 2 switches monitor IGMP queries and join/leave reports sent by Layer 3 devices on each port. The switch can then make decisions on which ports have connected devices that want the traffic. Multicast traffic is only sent to those ports, greatly reducing the load on the switch and freeing network bandwidth for the ports that do not need the multicast traffic. IGMP snooping has therefore become an essential feature to ensure optimal performance in industrial Ethernet networks.
The performance demands and characteristics of automation and control networks are fundamentally different from those of the commercial world where Ethernet technology originated. Reliability, bandwidth, and determinism continue to be the critical performance concerns for industrial users, and a number technology, standards, and practices have proven effective at meeting each of these demands. As industrial Ethernet manufacturers and users both continue to develop greater experience and expertise, there is no doubt that even greater heights of performance will be reached.
Sandoval is a field applications engineer at Moxa Americas Inc., Brea, Calif., where he has worked for five years. He was the first field applications engineer at the company’s U.S. headquarters and helped launch the company’s Professional Industrial Network Services. He has bachelor’s degrees in electrical and computer engineering, as well as a master’s degree in electrical engineering, all from California Polytechnic University. Sandoval is a Cisco Certified Network Associate and has Level 1 Linux Professional Institute Certification.
Werning is a field applications engineer at Moxa with 16 years of experience in embedded communications and computing. He also has experience in control systems engineering. He has a bachelor’s degree in computer science from the University of Wisconsin-Platteville with a minor in electronics technology.