Make the virtualized manufacturing environment visible
The world has changed, and we no longer live in the environment where all computers, networks, and storage devices used for manufacturing are on the manufacturing floor. It was easy to see when something was not working because the lights would be either out or blinking. We had indicator lights for power, CPU activity, disk activity, network activity, and router status. Now most elements are remote from the manufacturing floor and are in virtualized systems where there are no physical lights or external indications of problems. A problem is usually noticed when a terminal or mobile device starts responding very slowly or even stops responding entirely. Unfortunately, in facilities that have fully automated solutions for data historians, material track-and-trace systems, and advanced process control, there may not be someone on the manufacturing floor to notice a problem until it has become mission critical. The first call is often to the IT support department, followed by a long list of instructions and checks to uncover the source of the problem. If the support organization is divided by technology (networks, servers, firewalls, Oracle databases, Microsoft databases, and disk servers), then it can take a long time to determine which element has actually failed, diagnose the real problem, and provide the fix. The lack of external indicators and remote location means that diagnosing a problem can be done only by the IT staff, not by the users of the systems.
Monitoring remote and virtualized devices is becoming an important part of every IT organization. IT departments rely on log files to diagnose problems, and every networked device generates log files. Even printers, firewalls, and intrusion detection systems generate log files. Usually there are more entries in the log files than anyone can easily review, so diagnosing a problem may require searching tens of thousands of log file entries in dozens of log files. It is often difficult to pinpoint the original problem and separate it from secondary failures that occurred because of the original problem. This method is inefficient and expensive if production is stopped until the problem is found and fixed.
It is time to treat manufacturing IT assets the same way we treat production equipment by monitoring for health and logging performance. Data historians are used to collect information on production equipment to diagnose failures or slowdowns, and they should also collect information on all manufacturing IT assets. This is especially important in virtualized environments where visibility is low and manufacturing IT assets have to share resources with other applications. When operators complain that the system seems to slow down at the same time every day, having historical data will provide the information needed to diagnose the problem.
One of the easiest ways to add manufacturing IT assets to your current data historian is through an OPC-UA service on each server. [www.opcfoundation.org – link to OPC-UA service] The service can monitor the server information and also collect information from the other network devices, such as disk arrays, network switches, firewalls, and printers. Any OPC-UA client, such as an HMI, would be able to browse and view the status of manufacturing IT assets. SCADA and HMI systems have alarm detection and management elements, and these can also be applied to information directly obtained through OPC-UA or from your data historian. Alarms on IT assets can be used to alert operations if there is a problem, or even an indication that there will be a problem in the future.
OPC-UA uses Ethernet technology, is Web-based, uses standard network ports, and has a built-in security model. These features enable an OPC-UA server application to operate in almost all existing IT infrastructures and make it especially suitable for virtualized environments. If your data historian does not yet have an OPC-UA interface, there are packages that convert the OPC-UA information into the traditional OPC format. Also, free OPC-UA tools are available. [www.opcconnect.com/ua.php – link to free OPC-UA tools]
Create OPC-UA tags to monitor the load on each processor core, the available memory, network throughput, and available disk space for each server in your virtualized environment. Often problems occur because of normal IT changes that operations may not even be aware of, such as changing a network card or applying a hotfix. You can add OPC-UA tags that expose the latest patch version, the last boot time, the last hotfix version and date, the state of critical services (running or stopped), and the network card MAC address. These do not change very often, so they would take very little space on a data historian but may provide invaluable information for problem diagnostics.
Other elements to monitor include network statistics for each VLAN with information from physical switches, virtual switches, and routers. These statistics include throughput rates in packets per second, bit rates, and the number of dropped packets. You can monitor the read-and-write rates for disk servers, the read error rates, and even the temperature of the disk drives. You can also monitor the number of passed and blocked packets in firewalls to see if any element has had its security signature inadvertently modified. Significant changes in any of these values may indicate an impending problem.
In modern manufacturing environments it is critical that manufacturing IT assets are treated the same as other production equipment. When IT assets fail or can provide an indication of pending failure, then operations must be informed and be prepared to take immediate action. Adding IT assets to the data historian and alarm system allows faster reactions and can provide detailed information to the IT service organization to get your production back up and running. Use existing data historian and OPC-UA to provide critical visibility into the virtualized IT environment and stop operating IT assets in the blind.
– Dennis Brandl is president of BR&L Consulting in Cary, N.C., www.brlconsulting.com. His firm focuses on manufacturing IT. Contact him at email@example.com. Edited by Mark T. Hoske, content manager, CFE Media, Control Engineering and Plant Engineering, firstname.lastname@example.org.
This posted version contains more information than the print / digital edition from February 2013 Control Engineering.
Free OPC-UA tools www.opcconnect.com/ua.php
See below, more from Brandl on virtualization.