Data center design-build upgrade is easier with process controls
A project to replace an out-of-date data center began in 2012; M.C. Dean focused on building the new LEED Gold facility to support a 6 MW data floor, which would eventually to be expanded to 10 MW. The project included 6 control panels (2 PLC and 4 remote I/O), 2 redundant automation stations, 16 remote I/O racks with more than 1,800 hard I/O points, 5 operator workstations, 2 server racks, 10 network cabinets, and 45 network switches for collecting more than 35,000 soft I/O points.
The largest issue with the current data center was its age. Over the years, network and power cables had become entangled under the data center floor, making it extremely difficult to troubleshoot connection problems. The heating, ventilation, air conditioning (HVAC) system was being pushed beyond its design limits, and the uninterruptible power supply (UPS) system was so old it lacked the backup power needed to keep the facility online during system outages.
The problem threatened to shut down the facility at any moment for days at a time. Just one day of potential IT service outage costs the company $25 million in lost productivity. Building the new data center presented a number of challenges and design requirements, including integration of facility systems into one operator platform, meeting Uptime Tier III certification, and integrating physical and virtual security into the design.
Three main disciplines contributed to the project. The electrical portion was done by M.C. Dean and included the electrical switchgear, protective relays, electrical metering, lighting, and the fire alarm system. M.C. Dean also was responsible for the controls portion that included mechanical process control, supervisory control and data acquisition (SCADA) interface, designing and building the control panels, and designing the network infrastructure. Southland Industries was responsible for the mechanical portion of the project, including designing the mechanical process, sizing, and selecting equipment, such as chillers, pumps, and air handling units, with the industrial instrumentation to control the equipment. Southland was also responsible for providing the building automation system, which served the noncritical HVAC spaces in the facility.
Seamlessly integrating all facility systems into one workstation was the most important design requirement for the project. The customer was operating the facility with four workstations representing six systems to evaluate utility plant status. Having all the data exist in separate workstations and software packages made it hard to troubleshoot problems and track maintenance activities. For the new facility, the customer sought an integrated solution combining all separate hardware and software packages in a central workstation.
Three main systems needed integrating. The mechanical system consisted of several drives, chillers, pumps, and transmitters that represented more than 1,800 hard I/O points. There were 72 variable speed drives (VFDs), which needed to use hard I/O for control and Profinet for monitoring. There were also several mechanical skid systems, such as reverse osmosis, rainwater harvesting, refrigerant detection, and chemical treatment. These skid systems had soft and hard points that needed to come back to the central system for monitoring and control.
The electrical devices included 120 relays, 130 electrical meters, and more than 550 miscellaneous electrical devices, including power distribution units (PDUs), lighting inverters, UPS units, battery chargers, and smart circuit breakers. All of these devices communicate over soft I/O with many protocols to choose from.
The last big hurdle was the building automation system (BAS), designed to serve all noncritical spaces of the facility, including the office building and the facility access center. Because this system was provided by the mechanical contractor and was based on a different software and hardware package, we needed to figure out how to integrate the graphics on both systems into one operator workstation while still retaining the same overall "look and feel."
M.C. Dean had a lot of previous experience with the process control system in mission-critical process plants but never in a critical operating environment such as a data center. After reviewing the project with Siemens, and after comparing several different distributed control systems, we decided that control system was the best fit to integrate with the automation stations chosen for control over the mechanical equipment. At this point, we were unsure about how we would collect all information from the soft I/O points due to the complexity and number of soft communication devices.
With more than 550 devices spread over more than 20 device types, we needed to be careful in our software selection to make the integration as fast and efficient as possible. We found a control system with two add-ins designed for integrating communication protocols into the operating system (OS) tag servers.
The first add-in provides the ability to integrate hardware devices that use different protocols into the control system. Some of these protocols include DNP3, Modbus, and IEC-60870. Because we needed to integrate devices from the electrical and mechanical systems, this worked out as the solution to get them to co-exist.
The second add-in works in much the same way as the first, except that it includes a driver for the IEC-61850 protocol. This was imperative because we planned to use this protocol to communicate with the electrical meters and protection relays, which represented the majority of the hardware devices throughout the data center. Both add-ins included the control system framework to communicate to the soft I/O devices. They helped to bring together the idea of "seamless integration" because the operators would no longer have to go to separate workstations or use separate programs to operate the facility.
To use the add-ins in the most efficient manner, we used a database automation (DBA) tool to automatically generate the OS database with the display hierarchy, required variables, alarm signals, and faceplates. Using the database tool, we created one device "type" for each unique piece of hardware and then replicated that type to save engineering time. So Instead of creating 550 individual devices and attaching tags to each one of them, we created 20 device types and then numbered them incrementally within their type to represent all the devices.
This saved us a substantial amount of engineering time that would have otherwise been lost to mindless data entry for each device. It eliminated potential data entry errors because the tags are linked with the operator faceplates so that if a change is made to a device type, that change will be reflected in every instance of that device in the system.
The same concept increased engineering efficiency when working with the hard I/O points through the use of a bulk engineering tool that uses Microsoft Excel spreadsheets to format the individual I/O points for import into the control system. We first created our control module types in the control system software editor. We then organized all of our individual I/O points in the Excel template file including rack, slot, and point locations; which control module type they belonged to; any interconnections they would have; and other relevant information that the control system would need. We then imported both items into engineering software, at which point the control system would auto-generate a control sheet for each device that was assigned in the Excel spreadsheet.
With around 1,200 devices to integrate, the engineering software reduced the amount of engineering time required to manually create the symbol table, address IO, and make the interconnections between sheets. These tools helped us stay ahead of the already tight construction schedule and also assured us that the number of errors we would find during commissioning would be much smaller than if we had to do all programming via manual data entry.
Another design challenge we faced was vertically integrating all of the different project teams within M.C. Dean. Typically, projects are horizontal where each "team" finishes its part and passes it on to the next phase’s team and so on. At the end of the project when it’s time to bring everything together for commissioning, problems arise due to lack of process overlap, no common understanding of the problems or solutions, and things ultimately get worked out in a reactionary fashion. We were determined as a company not to let this happen and to work closely both within our own firm and with the other disciplines to ensure that when it came time for commissioning, there would be no holes left in the design and everything would come together as smoothly as possible.
Normally, our electrical design team chooses to use certain protection relays for all electrical designs. It is something they have become comfortable with and typically what is specified on most jobs. The difference in this project is that the M.C. Dean controls team had to design and integrate the network infrastructure to monitor and control all of these relays that the electrical design team was responsible for choosing. This was the first hurdle in the vertical integration approach that we would have to get over.
The electrical design team began to look at other protection relay manufacturers as possible alternatives to the standard relay used. Another line of relays was examined as a possible alternate solution.
We asked each manufacturer to design a test for the project that we could test in our office. After reviewing the designs, we came up with a pre-test analysis of the similarities and differences between each manufacturer’s solutions. Each manufacturer then built a rack that contained all of the relays needed to replicate one of the main-tie-tie-main lineups we would have on site. One rack consisted of a redundant star topology that is "tried and true," while the other was an IEC-61850 relay platform. Representatives from each manufacturer joined us while we simultaneously tested and compared racks side by side over a 3-day period. We compared many different components, including network speed, ease of use, security, redundancy, and overall functionality.
One experiment we performed was to determine the speed at which the relay could process commands sent from an HMI and then send back a response to that same HMI. We used a global positioning system (GPS) time clock to timestamp the messages sent between the relay and HMI. The one relay averaged about a 3.8-second delay while the other relay averaged only a 0.7-second delay. The faster relay has the flexibility to allow end users to determine the scan time at which they wanted to process logic within the relay. This is very similar to a PLC having different scan times based on time process PID loops and other critical information. The other relay did not have any options to adjust the scan time of the logic, determined to be the bottleneck in the test.
At the end of the day, the chosen technology was more cost-effective, cut down on the physical infrastructure needed to implement the design, simplified the programming effort within the controller, and provided a faster, more powerful relay. If we had not stepped back and considered a different solution, we would have wasted materials and engineering time for a less-optimized offering.
Another design requirement of the project was achieving the Uptime Institute’s Tier III certification. The Uptime Institute is a third-party company that represents an objective basis for comparing the functionality, capacity, and expected availability of a project’s infrastructure based on a tiered standard of guidelines.
This project was held to the third tier certification, which included guidelines such as concurrent maintainability for any equipment that serves the data center floor. In addition, each and every capacity component and element in the distribution paths must be able to be removed from service on a planned basis without impacting any of the IT equipment. There must also be sufficient capacity to meet the needs of the site when redundant components are removed from the system.
As with any data center, the sole purpose of the facility is to support server operation without any interruption of service. This requirement extends to the electrical equipment, the mechanical systems, network infrastructure, and control system that connects everything. This requirement forced us to think about the physical and virtual vulnerabilities of the system and how we could incorporate the necessary amount of redundant paths to meet the certification guidelines.
We started by designing two redundant operations rooms, which included the OS servers, GPS time clock, and operator workstations. These two rooms would serve as backups of one another in case a catastrophic event occurred at one location. Each of the OS servers, PLCs, and workstations had a redundant physical connection to the network, and in many cases we exceeded the N+1 requirement with little to no extra work.
From there we determined all electrical switchgear locations that would need to exist on the network and created a "node" for each location. Each "node" was represented by a network cabinet with two redundant switches inside powered from two, separate, redundant UPS sources. These nodes were then linked together via fiber to form a ring topology. This ring topology allowed us to meet the N + 1 network redundancy requirement because we could stand to lose any segment on the ring and still retain the same amount of capacity components as before.
The majority of the devices we needed to collect information from were located within the switchgear itself. These devices included things like electrical meters, protection relays, and smart circuit breakers. To gather this data back to a centralized point, we added smaller switches inside the gear. These smaller switches formed a ring when connected, which was then coupled with the main network ring to form a sub-ring. Because we were able to again use a ring topology for the sub-rings, it allowed us to add yet another layer of redundancy so that if a sub-ring switch were to fail, it would not affect the network infrastructure on the main ring.
One of the things we learned from the protection relay tests was how important network recovery time was after a failure. We explored several different redundancy protocol options, such as the Media Redundancy Protocol (MRP) and Rapid Spanning Tree Protocol (RSTP). The MRP protocol has a maximum recovery time of 200 ms between the time that a network link goes down and the redundant path takes over. This is true of any network that has 50 or less nodes connected in a ring topology. RSTP, on the other hand, typically works in a mesh topology and has an average recovery time of 50 ms for each node that exists on the network. If you have a network of 8 nodes, you should expect to see a 400 ms recovery time.
During the relay tests, we used a GPS clock to timestamp the messages that were sent and received after disconnecting one side of the redundant network connection on each manufacturer’s relay. One relay had a 900 ms recovery time using the RSTP protocol, while the others relay had a 5 ms recovery time using the MRP protocol. One relay on-board network port was un-managed and formed one MAC address. The managed network switches were connected as needed to resolve where the failure had occurred in the network and then determine how to recover from it.
The new relays each have two managed network ports on-board with separate MAC addresses. When the ring breaks, the relays on each side of the failure point can reconfigure themselves and let the network know the location of the problem. This saves valuable time and reduces the chance of data being lost during switchover.
In addition to network redundancy, we also implemented redundancy on our system bus. The system bus consisted of two redundant PLC panels that were physically separated to increase the system availability in the event one rack was damaged. In addition to that, we had four remote I/O panels with two to three I/O racks in each cabinet.
All of these PLC and reconfigurable I/O (RIO) panels were connected via redundant fiber rings using optical link modules (OLMs). These modules allowed us to convert the Profibus drop between the I/O racks in each panel to fiber to carry the signal over the long distances between panels. Because each of the redundant OLMs was connected to its own communication module within the I/O rack, it allowed the system to tolerate a failure down to the rack level without losing any data.
We also chose to use automation stations for redundant capabilities. High availability allows for two PLCs to operate in an active/standby configuration so that no loss of functionality or data occurs in the event of a failure. The panels also contained a hot-swappable backplane, which allowed us to swap I/O cards and communication modules without interruption of service. Both options were critical to achieving the redundancy requirements of the Uptime Institute’s Tier III standards.
Physical, virtual security
Cyber security has become an increasingly hot topic in the supervisory control and data acquisition (SCADA) and process control world. Hackers can gain control of SCADA systems and cause havoc for critical facilities. This project lies in an air-gapped system to prevent any unwanted attacks from outside influences. However, an even bigger threat exists from individuals with regular access to the network hardware and control components within the facility. To combat this, we implemented several intrusion detection measures to help identify tampering with system components by building employees or vendors.
Each of the PLC and RIO cabinets was outfitted with a door switch that was wired to the digital input card inside the cabinet. Whenever the cabinet door was opened, the operator would receive an alarm about an open door without authorized access or outside of regular maintenance/troubleshooting activities. One problem we experienced was how to implement this same security on the network cabinets spread throughout the facility. We could have wired that same door switch back to the closest PLC or RIO panel, but instead we decided to think outside of the box.
Our solution uses the concept of a loopback tester to determine the link status of the ports on the switches located inside the network cabinets. A loopback tester ties the "transmit" and "receive" pins together, which causes the port to become active and begin discarding packets. If both individual links are broken between the Tx and Rx pins, a link loss alarm occurs at that switch. That same methodology was used to determine when the network cabinet doors were open.
We installed a momentary door switch on each cabinet and tied the switch contacts into an RJ-45 to terminal block adapter. When the cabinet door opens, the door switch disconnects the Tx and Rx pins on the network port, causing the port to lose its link status. The operator would then get an alarm via SNMP trap telling him that someone had opened the cabinet door in the field. If this was an unauthorized action outside of normal maintenance checks, the operator could alert security and direct them to the exact room location based on the alarm text to determine if a threat was present.
There are many measures you can take to protect against physical threats to a control system, but many times that is not enough. An appropriate level of virtual security needs to be implemented to protect against threats in the event that someone gains physical access to the network. We used many security measures, such as VLANS, MAC address based port security, and link alarms to keep the network safe from inside intruders.
We used VLANS to separate different groups of devices on the network. This allowed us to isolate the individual data packets between unlike devices and separate parts of the network for increased security. If someone were to access the network from a port located on VLAN 1, they would be unable to access any devices outside of VLAN 1.
We also used MAC address-based port security to further harden the network. This allowed us to assign a MAC address to each port on the network so that only the device with that exact MAC address would be allowed to talk when plugged into that port. In the event that a network cable was unplugged, the operator would get an alarm indicating that this specific device had been disconnected and take further action if necessary. If the intruder then tried to plug in a laptop computer, the intruder would not be able to communicate with the port because the MAC address of the laptop would be different than that of the original device. In addition to these security measures, we also disabled all spare network ports that did not have a device connected so that they would not function without software reprogramming.
It is common practice for device manufacturers to assign a default username and password to the configuration interface. Ninety-nine percent of the time, these credentials are left to the factory defaults and anyone can look them up in the manufacturer’s literature to gain access and change settings. We ensured that every device had its credentials changed to protect against this threat.
Invisible to end user
The control system is used in some of the most critical industrial plants around the world. Regardless of whether a volatile chemical is being produced or vital information is being protected, the complexity and reliability of the controller was best suited to fit the needs of this project. We had many design challenges to meet with regard to integrating all of the different systems into one cohesive package that would be invisible to the end user. The controller provided the framework and add-ons needed.
Redundancy was a major concern for us and something that we had to examine very closely. From a hardware level, the automation stations met requirements to keep the plant up and running in the event that one PLC was to fail. The controller software had many redundancy features to automatically switch over clients and servers to avoid any interruption of service. Software tools made it easy to configure thousands of points and devices without having to do manual entry of each occurrence. This saved us hundreds of engineering hours and also ensured that mistakes were not made when duplicating like devices. Overall, the controller chosen turned out to be the right choice for us, and we look forward to implementing it on future projects.
– Anthony Pannone is control engineer for M.C. Dean Inc., Washington, D.C.; edited by Mark T. Hoske, content manager, Control Engineering, firstname.lastname@example.org.
- Process control system provides central integration point for information.
- Data center upgraded design has redundancy and security.
- Project included six control panels, redundant automation stations, 16 remote I/O racks, and more.
In a time when the value of big data integration is increasingly encouraged, could a new control system provide an integrated funnel?
– Learn more about the Siemens used in the data center project below.