Best practices to capture and store system critical data

Collecting critical data is a challenge, but there are several ways companies can overcome them and achieve gains.

By Brian E. Bolton March 14, 2023
Courtesy: Brett Sayles


Learning Objectives

  • Understand how an interface node is set up on a network where a data source is located and uses interfaces or connectors to obtain the data and write it to a historian.
  • When deciding what data to capture and store, it is important to know the data types coming from the assets or elements.
  • Understood how data formats are often stored and the different file formats with compression algorithms that provide benefits for specific use cases.

Data insights

  • Software and hardware for collecting and storing data can be purchased from a multitude of different third-party vendors, creating challenges for integration.
  • Interface nodes and connectors can be used to aggregate data from diverse sources into a single historian.
  • Choosing the proper data format for your use case is also important. Common data formats include: CSV, JSON, AVRO and Parquet.

Manufacturers have continued to take on projects to automate processes, collect and historize data and report on results in the form of key performance indicators (KPIs) or metrics that add significant value to their business. In the process, however, they encountered many ways to collect and store data and had to choose from a huge list of software and hardware vendors.

Each third-party vendor specializes in specific application areas, making their service appealing to the companies that require them. After all, collecting system critical data from a boiler is different than collecting data from a process that creates tire treads.
Choosing from multiple third-party vendor data collection systems also has created challenges for manufacturers. Personnel must deal with multiple historians and multiple procedures to access company data for analysis and reporting.

To overcome these data acquisition challenges, it’s important to review the various data sources, types and formats. It’s also important to look at best practices to capture and store system critical data into a central location for ease of access.

Data sources from assets or elements

Data is generated at various locations and sources within a process. The data indicates what is currently happening with the equipment and process. Sources can also be referred to as assets or elements, which often present digital or analog data via a programmable logic controller (PLC), a supervisory control and data acquisition (SCADA) system, a distributed control system (DCS), a relational database, a laboratory information management system (LIMS) or even a manual logger. The data is then stored in a database or a historian.

A best practice would be to collect required or desired system data and use a historian to store it in one location. To accomplish this challenging task, an interface node is installed and configured. The interface node is often set up on the network where the data source is located and uses interfaces or connectors to obtain the data and write it to the historian. Here are some examples of interfaces and connectors.


  • OLE for process control – data access (OPC DA)

  • OPC Historical Data Access (OPC HDA)

  • Relational database management system (RDBMS) via open database connectivity (ODBC)

  • Universal File and Stream Loading.

  • AVEVA PI System to PI.

Connectors (for AVEVA PI Systems):

  • OPC UA

  • Wonderware Historian

  • PI SQL Connector

  • UFL.

Most of today’s technology allows data from the source to be processed and presented in real time. While real-time data may not be necessary in all cases, having the option to react to the data somewhere other than at the asset or element level could reduce reaction time when things start heading in the wrong direction. Real-time data with notifications in place can help prevent a wide range of incidents, such as:

  • Product being pumped to a storage tank when there isn’t enough room in the tank

  • Product in a storage tank failing to cool to temperature

  • Thermal oxidizer temperature dropping out of permit range

  • Hot spot detection in catalytic converters

  • Loss of process air pressure.

Having the data from multiple sources collected, stored and analyzed from one database makes processing and communicating key data easier and more consistent.

Common types of data

When deciding what data to capture and store, it is important to know the data types coming from the assets or elements, what it is going to take to capture and store the data in the database and if there are any limitations. There are many data types to consider, including:

  • Integer – Numeric data type for numbers without fractions

  • Floating Point – Numeric data type for numbers with fractions

  • Character – Single letter, digit, punctuation mark, symbol or blank space

  • String – Sequence of characters, digits or symbols – always treated as text

  • Boolean – True or false values

  • Enumerated – Small set of predefined unique values (elements or enumerators) that can be text based or numerical.

  • Array – List with a number of elements in a specific order – typically of the same type

  • Date – Date in the YYYY-MM-DD format (ISO 8601 syntax)

  • Time – Time in the hh:mm:ss format for the time of day, time since an event or time interval between events

  • Datetime – Date and time together in the YYYY-MM-DD hh:mm:ss format

  • Timestamp – Number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970 (Unix time).

Different data formats

Data formats (or file formats) are typically stored in deep storage systems as data files and in different file formats with compression algorithms that provide benefits for specific use cases. For example, the way data is stored in a data lake is critical, and the format, partitions, and compression drive success. Some of the data file formats include:

  • CSV – This type of data file is a great option when compatibility, spreadsheet processing and readable data is needed. The drawback is the data must be flat. A flat database is a basic column/row database where historians are relational, meaning the data can come from multiple places in the database based on a timestamp or other “key” criteria.
  • JSON – When a nested format is required (i.e., special data sets that stay in sync with the current row of their parent data set), JavaScript Object Notation (JSON) is a great way to go and is used in several application programming interfaces (APIs). In some cases, JSON is a bit harder for people to read, especially if they are not familiar with structured query language (SQL) or other programming languages.
  • Avro is used for storing row data in a binary format, making it compact and extremely efficient. It stores the schema in JSON format, making it easier to read and interpret by any program.
  • Parquet is a columnar storage file format with schema support and is known to work well with a Hive plugin. It is used to efficiently store large data sets.

The file format greatly depends on the system being used. Consider the following items when choosing the file format:

  • Data structure

  • Performance

  • Readability

  • Compression

  • Schema

  • Compatibility.

Developing a strategic data acquisition approach

To overcome data acquisition challenges, businesses need a well-planned and executed approach that takes the data collection, analyzing and communicating to the next level while generating a great return on investment (ROI). While there are many options available for collecting, analyzing and communicating critical data, pulling the required data results from a single location is the best approach.

In the event data cannot be stored in a central historian/database, reporting tools are available that can pull key data from multiple sources and report on that data within a single report. The ability to react to data conditions, based on incidents or events and generate notifications/reports to the appropriate people saves valuable time and money.

Most businesses will be hard-pressed to bring all their data to one location, especially if resources are limited. Tackling a project of this magnitude often requires consulting a third-party partner with the expertise and the tools to put this type of project together. The right partner can provide a holistic view of data acquisition systems and software, while helping review the various vendor system options, including historians and data analytic tools.

Understanding the various data sources, types and formats and using best practices can help manufacturers more easily access and analyze critical system data in a central data repository. In doing so, they can achieve the desired business gains and stay ahead of their competition.

Brian Bolton, consultant, Maverick Technologies, a CFE Media and Technology content partner. Edited by David Miller, content manager, Control Engineering, CFE Media and Technology,


Keywords: Data acquisition, data historians


What new insights could your facility attain if data were better integrated?

Author Bio: Brian E. Bolton ( is a consultant for MAVERICK Technologies, a CFE Media content partner. He has more than 35 years of experience in chemical manufacturing, including more than 20 years involved with the OSIsoft PI Suite of applications, quality assurance, continuous improvement and data analysis. Maverick Technologies is a member of the Control System Integrators Association (CSIA).