Best practices to capture and store system critical data

Collecting critical data is a challenge, but there are several ways companies can overcome them and achieve gains.

Courtesy: Brett Sayles

Learning Objectives

Understand how an interface node is set up on a network where a data source is located and uses interfaces or connectors to obtain the data and write it to a historian.
When deciding what data to capture and store, it is important to know the data types coming from the assets or elements.
Understood how data formats are often stored and the different file formats with compression algorithms that provide benefits for specific use cases.

Data insights

Software and hardware for collecting and storing data can be purchased from a multitude of different third-party vendors, creating challenges for integration.
Interface nodes and connectors can be used to aggregate data from diverse sources into a single historian.
Choosing the proper data format for your use case is also important. Common data formats include: CSV, JSON, AVRO and Parquet.

Manufacturers have continued to take on projects to automate processes, collect and historize data and report on results in the form of key performance indicators (KPIs) or metrics that add significant value to their business. In the process, however, they encountered many ways to collect and store data and had to choose from a huge list of software and hardware vendors.

Each third-party vendor specializes in specific application areas, making their service appealing to the companies that require them. After all, collecting system critical data from a boiler is different than collecting data from a process that creates tire treads.
Choosing from multiple third-party vendor data collection systems also has created challenges for manufacturers. Personnel must deal with multiple historians and multiple procedures to access company data for analysis and reporting.

To overcome these data acquisition challenges, it’s important to review the various data sources, types and formats. It’s also important to look at best practices to capture and store system critical data into a central location for ease of access.

Data sources from assets or elements

Data is generated at various locations and sources within a process. The data indicates what is currently happening with the equipment and process. Sources can also be referred to as assets or elements, which often present digital or analog data via a programmable logic controller (PLC), a supervisory control and data acquisition (SCADA) system, a distributed control system (DCS), a relational database, a laboratory information management system (LIMS) or even a manual logger. The data is then stored in a database or a historian.

A best practice would be to collect required or desired system data and use a historian to store it in one location. To accomplish this challenging task, an interface node is installed and configured. The interface node is often set up on the network where the data source is located and uses interfaces or connectors to obtain the data and write it to the historian. Here are some examples of interfaces and connectors.

Interfaces:

OLE for process control – data access (OPC DA)
OPC Historical Data Access (OPC HDA)
Relational database management system (RDBMS) via open database connectivity (ODBC)
Universal File and Stream Loading.
AVEVA PI System to PI.

Connectors (for AVEVA PI Systems):

OPC UA
Wonderware Historian
PI SQL Connector
UFL.

Most of today’s technology allows data from the source to be processed and presented in real time. While real-time data may not be necessary in all cases, having the option to react to the data somewhere other than at the asset or element level could reduce reaction time when things start heading in the wrong direction. Real-time data with notifications in place can help prevent a wide range of incidents, such as:

Product being pumped to a storage tank when there isn’t enough room in the tank
Product in a storage tank failing to cool to temperature
Thermal oxidizer temperature dropping out of permit range
Hot spot detection in catalytic converters
Loss of process air pressure.

Having the data from multiple sources collected, stored and analyzed from one database makes processing and communicating key data easier and more consistent.

Common types of data

When deciding what data to capture and store, it is important to know the data types coming from the assets or elements, what it is going to take to capture and store the data in the database and if there are any limitations. There are many data types to consider, including:

Integer – Numeric data type for numbers without fractions
Floating Point – Numeric data type for numbers with fractions
Character – Single letter, digit, punctuation mark, symbol or blank space
String – Sequence of characters, digits or symbols – always treated as text
Boolean – True or false values
Enumerated – Small set of predefined unique values (elements or enumerators) that can be text based or numerical.
Array – List with a number of elements in a specific order – typically of the same type
Date – Date in the YYYY-MM-DD format (ISO 8601 syntax)
Time – Time in the hh:mm:ss format for the time of day, time since an event or time interval between events
Datetime – Date and time together in the YYYY-MM-DD hh:mm:ss format
Timestamp – Number of seconds that have elapsed since midnight (00:00:00 UTC), 1^st January 1970 (Unix time).

Different data formats

Data formats (or file formats) are typically stored in deep storage systems as data files and in different file formats with compression algorithms that provide benefits for specific use cases. For example, the way data is stored in a data lake is critical, and the format, partitions, and compression drive success. Some of the data file formats include:

CSV – This type of data file is a great option when compatibility, spreadsheet processing and readable data is needed. The drawback is the data must be flat. A flat database is a basic column/row database where historians are relational, meaning the data can come from multiple places in the database based on a timestamp or other “key” criteria.
JSON – When a nested format is required (i.e., special data sets that stay in sync with the current row of their parent data set), JavaScript Object Notation (JSON) is a great way to go and is used in several application programming interfaces (APIs). In some cases, JSON is a bit harder for people to read, especially if they are not familiar with structured query language (SQL) or other programming languages.
Avro is used for storing row data in a binary format, making it compact and extremely efficient. It stores the schema in JSON format, making it easier to read and interpret by any program.
Parquet is a columnar storage file format with schema support and is known to work well with a Hive plugin. It is used to efficiently store large data sets.

The file format greatly depends on the system being used. Consider the following items when choosing the file format:

Data structure
Performance
Readability
Compression
Schema
Compatibility.

Developing a strategic data acquisition approach

To overcome data acquisition challenges, businesses need a well-planned and executed approach that takes the data collection, analyzing and communicating to the next level while generating a great return on investment (ROI). While there are many options available for collecting, analyzing and communicating critical data, pulling the required data results from a single location is the best approach.

In the event data cannot be stored in a central historian/database, reporting tools are available that can pull key data from multiple sources and report on that data within a single report. The ability to react to data conditions, based on incidents or events and generate notifications/reports to the appropriate people saves valuable time and money.

Most businesses will be hard-pressed to bring all their data to one location, especially if resources are limited. Tackling a project of this magnitude often requires consulting a third-party partner with the expertise and the tools to put this type of project together. The right partner can provide a holistic view of data acquisition systems and software, while helping review the various vendor system options, including historians and data analytic tools.

Understanding the various data sources, types and formats and using best practices can help manufacturers more easily access and analyze critical system data in a central data repository. In doing so, they can achieve the desired business gains and stay ahead of their competition.

Brian Bolton, consultant, Maverick Technologies, a CFE Media and Technology content partner. Edited by David Miller, content manager, Control Engineering, CFE Media and Technology, [email protected]

Learning Objectives

Data insights

Data sources from assets or elements

OLE for process control – data access (OPC DA)

OPC Historical Data Access (OPC HDA)

Relational database management system (RDBMS) via open database connectivity (ODBC)

Universal File and Stream Loading.

AVEVA PI System to PI.

OPC UA

Wonderware Historian

PI SQL Connector

UFL.

Product being pumped to a storage tank when there isn’t enough room in the tank

Product in a storage tank failing to cool to temperature

Thermal oxidizer temperature dropping out of permit range

Hot spot detection in catalytic converters

Loss of process air pressure.

Common types of data

Integer – Numeric data type for numbers without fractions

Floating Point – Numeric data type for numbers with fractions

Character – Single letter, digit, punctuation mark, symbol or blank space

String – Sequence of characters, digits or symbols – always treated as text

Boolean – True or false values

Enumerated – Small set of predefined unique values (elements or enumerators) that can be text based or numerical.

Array – List with a number of elements in a specific order – typically of the same type

Date – Date in the YYYY-MM-DD format (ISO 8601 syntax)

Time – Time in the hh:mm:ss format for the time of day, time since an event or time interval between events

Datetime – Date and time together in the YYYY-MM-DD hh:mm:ss format

Timestamp – Number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970 (Unix time).

Different data formats

Data structure

Performance

Readability

Compression

Schema

Compatibility.

Developing a strategic data acquisition approach

related topics

you might also like

Advanced contextualization and visualization adds value to data

How to understand considerations for data acquisition systems

How unified namespace drives efficiency, quality in manufacturing

Choosing data acquisition technology that benefits smart factories

Get the newsletter

Timestamp – Number of seconds that have elapsed since midnight (00:00:00 UTC), 1^st January 1970 (Unix time).