Data Acquisition, DAQ

Improve industrial data integration with ETL software

Extract, transform, load (ETL) software can help improve data gathering for operations technology (OT) applications, but there are major challenges with data integration that companies need to overcome.
By John Harrington June 12, 2019
Courtesy: HighByte

Most people are familiar with Industrie 4.0, Smart Manufacturing and the Industrial Internet of Things (IIoT), terms used to describe tremendous changes in operations technology (OT). They have been brought on by a surge in underlying technologies including the cloud, Big Data, smart sensors, single-board solid-state computers, wireless networks, analytics, application development platforms and mobile devices.

Some of these technologies are not new, but recent price drops and improved ease of use have increased their usage. These technologies are being combined with traditional OT like control systems and manufacturing execution systems (MES) to improve operations and business functions of industrial companies by providing more data — and tools to leverage that data.

Many of these technologies were first developed for information technology (IT) departments to interact with other business disciplines. Given the vast amount of data in manufacturing and the need to improve operations, these tools are being evaluated and adopted by IT. However, operations teams looking to leverage industrial data face unique challenges around data integration, which have increased the effort required to deploy such systems.

The IT industry solved its data integration challenges by creating extract, transform, load software (ETL), which integrates business systems into analytics systems. These solutions are designed to extract data from other systems and databases like customer relationship management (CRM) and enterprise resource planning (ERP), combine this data in an intermediate data store before transforming the data by cleaning, aligning and normalizing it. The data is loaded into the final data store to be used by analytics, trending and search tools.

So why can’t ETL solutions be used by operations to prepare industrial data? The reason is industrial data coming off a control system in a factory has different challenges than transaction data coming from business systems. It’s crucial for companies to understand each role in ETL software to maximize potential data benefits.

Waiting to extract data

Operational data is not all stored in a database as transactions waiting to be extracted. Rather, it is available in real time from programmable logic controllers (PLCs), machine controllers, supervisory control and data acquisition (SCADA) systems and/or time series databases throughout a factory. Instead of extracting data from a handful of large databases, data must be collected from hundreds of devices and systems.

Transaction processing systems store complete records for each transaction, but in factories, process data is not captured as “transactions.” A high-volume discrete manufacturer cannot store the complete data set for each component that comes off the line. A batch manufacturer often needs to store more than a single value per batch. Industrial data also must be collected at a high rate to catch anomalies and must be stored at different rates based on the use case, which makes extraction more complex (see Figure).

Figure: Comparison of transactional data extraction to the more sophisticated requirements of industrial data extraction across a number of factors. Courtesy: HighByte

Figure: Comparison of transactional data extraction to the more sophisticated requirements of industrial data extraction across a number of factors. Courtesy: HighByte

Transform

Data transformation on operational data requires more of a conditioning than a transformation.

Operational data storage often happens periodically—every second, minute or hour. The stored data may be an actual value like the quantity produced, or it could be statistical calculations of the raw data like the average, minimum, and maximum temperature values checked every second, but recorded every hour.

PLC data points generally have an address or name and a value. However, these data points only provide a process or controls-centric view of the data. There are no descriptions, units of measure, operating ranges or other descriptive information.

This creates challenges as industrial data is used outside of the controls environment for machine maintenance, process optimization, quality and traceability. In these cases, the data must be analyzed and aligned by machine for machine maintenance, by process for process optimizations, and by product for quality and traceability. The required data often is available, but must be correlated and sometimes transposed into a usable format.

Typical factories also have machinery from many different vendors and equipment purchased over a wide timespan. This variety in machinery results in a wide variety of available data. Some data points may have different names while others may have different units of measure or different measurements. For analytics, trending or any sort of data analysis to be possible, the data points must be standardized, normalized, and in some cases, calculated based on component measures.

Analytics data generally is not as critical as controls data; companies use lower-cost sensors to collect data for non-critical analysis. However, these sensors can fail or drift so having redundant sensors with external data validation is important to ensure good data is being stored.

Load

There are also many business users who want access to high-resolution automated data feeds from operations. They use unique systems to analyze and make use of the data and have differing requirements. These business users vary by company but often include manufacturing operations, maintenance, quality and value engineering. Machine vendors also have started to sell service contracts with requirements for real-time data collection.

Managing the delivery of data is important. There are security risks as well as significant costs associated with storing incorrect, corrupt or useless data.

Industrial data extraction and transformation must happen close to the production machinery. This allows the data to be used by local edge analytics and sent to on-premises data centers or the cloud based on which is more efficient.

Realizing data’s value

The need to extract, transform and load operational data is as great as—if not greater than—the need for ETL in a typical IT business system integration. This demands a rethink of data architecture and the creation of new industrial data infrastructure solutions. Data integrations for industrial companies must be simplified and streamlined to achieve the expected value from Industrie 4.0, Smart Manufacturing and the IIoT.

John Harrington is co-founder and chief business officer at HighByte. Edited by Chris Vavra, production editor, Control Engineering, CFE Media, cvavra@cfemedia.com.

MORE INSIGHTS

Keywords: ETL, extract, transform, load, data integration

Extract, transform, load (ETL) solutions gather data and store it in a system for further analysis.

There is a great need for ETL operations technology (OT) data, but getting the data is a different challenge than information technology (IT) data.

ETL can improve OT data, but the process must be streamlined and simplified to make the most of the data’s potential.

Consider this

What applications in your factory would benefit most from an ETL solution and why?

ONLINE extra

About the company

HighByte is an industrial software company on a mission to make complex industrial automation data simple by building intuitive, off-the-shelf software.

Want this article on your website? Click here to sign up for a free account in ContentStream® and make that happen.


John Harrington
Author Bio: John Harrington is co-founder and chief business officer at HighByte.