Zibb
Subscribe to Control Engineering
FirstLight
Email
Print
Reprint
Learn RSS

Design for a fallible world

Dennis Brandl, BR&L Consulting -- Control Engineering, 1/1/2004

As control engineers, we tend to think that keeping the process running is our only job, but just because the process is operating doesn't mean that the plant is operating. A larger issue also needs to be addressed when using IT systems in manufacturing operations. That issue is a system architecture designed for IT failures.

The need for a robust system architecture was made obvious in 2003 when the MSBlast and LoveBug worms shut down IT systems around the world. These events revealed two types of manufacturing companies; those that continued production and those that had to shut down because of the worms or the IT response to the worms. Some companies were able to "raise the drawbridge" and continue production by disconnecting their operational systems from their business systems. Other companies could not separate their business and operational systems and had to shut down production. These responses illustrated the difference between tightly coupled systems and loosely coupled systems and between global systems and local systems.

Many IT departments have an unspoken bias to tightly coupled global systems because such systems are generally easier to build and maintain. For example, maintaining a single global instance of a document management system is easier than maintaining one instance per site. In addition, if connections are needed to other systems, it is easier to hardcode a tight synchronous connection than design and implement a standard-based asynchronous connection.

When everything is working, tightly coupled global systems are fine, but when things go wrong, the effect can be catastrophic. The best approach to system design should be to emulate the well-known saying, "Expect the best, but plan for the worst."

Companies that require absolutely reliable systems, such as banks and other financial institutions, will often set up separate servers, networks, and support organizations for the critical systems. These are separate from the normal business systems supporting HR, purchasing, and logistics. This same approach is required for manufacturing companies when the IT systems are critical to maintaining plant operations.

One of the first steps in this approach is to identify the systems critical to operations. Every system in use by operations needs to be examined. For example, if a company is using a centralized configuration management server for controlling all changes to PLC code, then when that system becomes unavailable, the maintenance group may be unable to make any emergency maintenance changes in PLC code during an IT outage. Another example of a critical system may be a global license manager that grants right-to-use for software packages. If an IT outage occurs, then users may be unable to access displays, reports, or recipe management systems because they can't obtain the right-to-use license.

When designing manufacturing systems to withstand IT failure, a good approach is to use local systems instead of global systems. This eliminates outages because of WAN failures, which were a major cause of plant shutdowns during the MSBlast attack. Multiple local systems are more expensive to maintain, but are much more robust in the face of failures.

Another good approach for designing robust systems is to use system interfaces that are asynchronous and buffered. This approach allows for temporary loss of communications or system failures, without causing cascaded system failures. Interfaces based on messaging systems are especially robust in the face of network and application failures. These interfaces should be the default choice for systems that do not need real-time synchronous communication.

Fortunately, not all solutions to robust operations in the face of IT outages need to be technology oriented. Phone lists and faxes can be used to collect critical decision information, and paper backup systems can be used to record critical information. However, the worst situation is to design an operational system based on the assumption that there will be no failures in network infrastructure or in IT applications.


Author Information
Dennis Brandl is the president of BR&L Consulting, a consulting firm focusing on manufacturing IT solutions, based in Cary, N.C. dbrandl@brlconsulting.com

Email
Print
Reprint
Learn RSS

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

Sponsored Links

 

Advertisement
SPONSORED LINKS

More Content

  • Blogs
  • Discussions
  • Webcasts
  • Podcasts
  • Videos

Blogs

  • Matt Luallen and Steve Hamburg of Encari
    Industrial Cyber Security

    November 28, 2008
    NIST SP 800-82 Guide to Industrial Control Systems Security (Section 6)
    This is the last review of NIST SP 800-82 Guide to Industrial Control Systems Security prior to the public comment expiring on November 30, 2008.&n......
    More
  • Peter Welander
    Pillar to Post: Peter Welander's Blog

    November 26, 2008
    Cornell corners chemical car competition
    For a light bit of reading before your Thanksgiving holiday (assuming you can take the time off) you might want to know that Cornell won the 10th a......
    More
  • View All BlogsRSS

Webcasts

Engineering-driven Ethernet
This Control Engineering Roundtable Webcast will address the engineering issues you should be aware of when exploring the adoption of Ethernet or when looking to expand its use in your facility.

Bridging gaps with wireless
Discover how you can create stronger, flexible and cost-effective wireless connections for your entire plant. Register today!

View All Webcasts
Advertisements





NEWSLETTERS

Get engineering industry news, trends, and business-critical information delivered directly to your inbox!

Click on a title below to learn more.

Weekly News (Weekly)
Process Instrumentation & Sensors (Monthly)
System Integration Monthly (Monthly)
Process & Advanced Control (Monthly)
Machine Control (Monthly)
Information Control (Monthly)
Automation Control (Monthly)
Product Review (Monthly)
Simplified Safety
Fieldbus Facts
PROFInews North American Edition
About Us   |   Advertising Info   |   Site Map   |   Contact Us   |   Useful Sites   |   FREE Subscription   |   RSS
© 2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites