How to effectively operate mission critical facilities

Achieving uninterrupted mission critical facility operation requires both standardized employee training and detailed documentation.

08/02/2012


Specialized technicians monitor the data center power and cooling systems, which provides alerts to any system issues. This type of technician is trained to respond to a wide variety of failure scenarios, using the emergency operating procedures that haveMission critical environments are just that—critical to business functions. Without exception, mission critical facilities cannot bear any shutdowns or interruptions. This applies even during planned maintenance, making proper preparation a vital factor to reducing human errors, equipment failure, and downtime. The successful operation of mission critical data center facilities requires process standardization, especially in the important areas of training and documentation. Properly executing these functions in support of equipment maintenance activities can alleviate a primary root-cause of downtime. 

The goal of every mission critical facility is to operate safely, reliably, and efficiently at its design capacity. Most studies of downtime in mission critical environments come to the same conclusion: human error is a leading cause. While there is no way to completely eliminate human error and its negative effects on business productivity, there are a number of steps facility managers can take to greatly reduce its frequency and impact. The most reliable method is to invest in effective documentation and training programs, which will provide the basis for improving accuracy, consistency, and reliability. 

Documentation and reporting

Personnel conduct a vibration measurement on a rotary UPS system. This information is used to perform an analysis as part of Lee Technologies (part of Schneider Electric’s) reliability centered maintenance program. Courtesy: Schneider ElectricNearly all critical facility operations have some level of documentation in place; however, some documentation programs do not meet the needs of mission critical environments. Considering the importance of accurate and current documentation to the reliable operation of the facility, a strong program standard is warranted. Structured documentation programs have a cost that varies according to system complexity, the facility automation scheme, and the level of change management needed to achieve the reliability and uptime goals of the enterprise. 

Mission critical facilities are delivered with a considerable volume of documentation, but effectively sustaining operations is dependent on the right type of documentation. Typically, the detailed procedures needed to perform important daily functions are missing or incomplete. 

Proper documentation requires the following: 

  • Detailed written procedures for all operations and maintenance activities including:
    • Emergency operating procedures (EOP)
    • Standard operating procedures (SOP)
    • Methods of procedure (MOP)
    • Administrative procedures (AP)
  • Site walk-through procedures
  • Facility work rules
  • Change management processes and procedures
  • Accurate and up-to-date drawings and schedules
  • Report templates
    • Weekly, monthly, quarterly reports on facility operations and system capacities
    • Incident reports
    • Failure analysis
    • Lessons learned
    • Near-misses.  

Training

This technician is performing an inspection on a diesel generator fuel system during a preventative maintenance event. Proper training allows for this inspection in lieu of subcontracting the service to an outside source. Courtesy: Schneider ElectricEmployee training should be a priority when new staff is hired, and should be conducted at regular intervals to ensure all personnel are up to date on any changes in industry standards and organizational best practices. Properly trained employees understand how the plant works, how to safely operate and maintain the plant equipment, and what to do when equipment and systems don’t function as expected. Thorough, accurate, and readily accessible documentation is both the foundation of this knowledge and the means of implementing it. However, the establishment of a comprehensive documentation and training program is a crucial, but rarely achieved, goal in mission critical environments. 

What constitutes “proper training”? A best practice approach is to implement a multilevel training program that aligns each site operating procedure to a specific level of certification. This ensures that all operating and maintenance procedures are conducted or supervised by fully qualified personnel. Certification is achieved through a rigorous evaluation program, with regular recertification required. Such a program requires a large variety of materials and methods, such as: 

  • Theory of operation for major equipment and systems
  • Training modules for EOPs, SOPs, and MOPs
  • Drills for EOPs
  • Exams for various training levels 

Personnel are performing a switching procedure using an approved method of procedure. Note that they are using the “pilot-copilot” method of stepping through the checklist. Courtesy: Schneider ElectricIt all starts with the most difficult aspect of any training program: developing the training materials. However, this effort cannot begin without timely and accurate information from the design and construction teams on the equipment configuration, the basis of design, the sequence of operations, and the as-built configuration. While this may seem to be readily available information, often it is poorly documented and late to be delivered. This is a major issue for both the commissioning and operations teams.

The main reason for the lack of effective training programs is the time and expense of development and training activities. This is a short-sighted view, however, as the cost and effort are largely offset by the resulting increased uptime, lower maintenance costs, and decreased employee turnover. The fact is that a proper documentation and training program is as important a consideration to achieving the required facility performance, efficiency, and reliability goals as the quality of the system design itself. 

An effective multilevel training program can be broken down into four certification levels: 

  • Level 1: Basic knowledge and emergency response
    • Level goal: Train an employee capable of properly responding to emergency situations.
    • Training covers:
      • Administrative functions
      • Theory of operation
      • Daily routines
      • Security policies
      • Emergency procedures.
  • Level 2: Intermediate knowledge and frequent procedures
    • Level goal: Provide focused teaching of critical systems in order for the employee to begin participating in routine work practices.
    • Training covers:
      • Technical critical systems equipment knowledge
      • Frequently performed and/or elementary operational procedures
      • Frequently performed maintenance procedures.
  • Level 3: Advanced knowledge and infrequent procedures
    • Level goal: Broaden training to include noncritical systems, and provide additional in-depth training on critical systems.
    • Training covers:
      • Technical noncritical systems equipment knowledge
      • Infrequently performed maintenance procedures
      • Infrequently performed and/or moderately difficult operational procedures.
  • Level 4: Subject matter expertise on specific systems
    • Level goal: Train employees to become subject matter experts so they in turn will be able to train new employees.
    • Training covers:
      • Select, technically difficult procedures throughout the facility
      • Specialized outside training
      • Training course development
      • Training delivery. 

Personnel prepare for a switchgear maintenance performed three times per year. This is a potentially hazardous procedure that can only be performed with a detailed procedure and by personnel with thorough training in this maintenance operation. Courtesy:Training doesn’t end after an employee has qualified and become certified at a certain level. It’s vital to continuously supplement that knowledge with lessons learned from all available sources, particularly the direct experience of the facility technical workforce. This new information is incorporated into the training program and formalized in the recertification process. To test skills and responsiveness, ongoing emergency response drills are conducted that keep employees at peak readiness to handle any emergent events in the mission critical environment. 

Achieving uninterrupted mission critical facility operation requires more than an investment in redundant critical infrastructure systems. It also requires both a financial investment and time commitment in their sustained operation, which stems from properly documenting the environment and training staff in conducting regularly scheduled, standardized maintenance on all facility equipment. 

The cost of these programs should be considered necessary to fulfill the critical mission and to protect the original infrastructure investment. The cost of creating and consistently implementing high-quality employee training and conducting effective maintenance is offset by increased uptime, longer asset life, more efficient system operations, and less employee turnover. 


As senior vice president of critical environment services, Woolley oversees the operation of all on-site facility operations and maintenance programs at data center solutions provider Lee Technologies, a subsidiary of Schneider Electric. He also leads the quality system group, which establishes and continuously improves the company’s service offerings, and is responsible for the company’s environmental health and safety program. He has been involved in the mission critical facilities management field for more than 20 years and has extensive experience in building technical service programs in addition to managing operations for more than 50 data centers throughout his career.



No comments
The Engineers' Choice Awards highlight some of the best new control, instrumentation and automation products as chosen by...
Each year, a panel of Control Engineering editors and industry expert judges select the System Integrator of the Year Award winners.
Control Engineering Leaders Under 40 identifies and gives recognition to young engineers who...
Learn more about methods used to ensure that the integration between the safety system and the process control...
Adding industrial toughness and reliability to Ethernet eGuide
Technological advances like multiple-in-multiple-out (MIMO) transmitting and receiving
Big plans for small nuclear reactors: Simpler, safer control designs; Smarter manufacturing; Industrial cloud; Mobile HMI; Controls convergence
Virtualization advice: 4 ways splitting servers can help manufacturing; Efficient motion controls; Fill the brain drain; Learn from the HART Plant of the Year
Two sides to process safety: Combining human and technical factors in your program; Preparing HMI graphics for migrations; Mechatronics and safety; Engineers' Choice Awards
The Ask Control Engineering blog covers all aspects of automation, including motors, drives, sensors, motion control, machine control, and embedded systems.
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
News and comments from Control Engineering process industries editor, Peter Welander.
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
This is a blog from the trenches – written by engineers who are implementing and upgrading control systems every day across every industry.
Anthony Baker is a fictitious aggregation of experts from Callisto Integration, providing manufacturing consulting and systems integration.
Integrator Guide

Integrator Guide

Search the online Automation Integrator Guide
 

Create New Listing

Visit the System Integrators page to view past winners of Control Engineering's System Integrator of the Year Award and learn how to enter the competition. You will also find more information on system integrators and Control System Integrators Association.

Case Study Database

Case Study Database

Get more exposure for your case study by uploading it to the Control Engineering case study database, where end-users can identify relevant solutions and explore what the experts are doing to effectively implement a variety of technology and productivity related projects.

These case studies provide examples of how knowledgeable solution providers have used technology, processes and people to create effective and successful implementations in real-world situations. Case studies can be completed by filling out a simple online form where you can outline the project title, abstract, and full story in 1500 words or less; upload photos, videos and a logo.

Click here to visit the Case Study Database and upload your case study.