Software readiness for data analytics and Big Data

Expand industrial data access and get more out of it with tools such as message queuing telemetry transport (MQTT), which can help manufacturers realize Industry 4.0’s benefits.

By Travis Cox March 10, 2020

A major aspect of Industry 4.0 is Big Data acquisition and analysis to turn data into actionable information and enable systems to make decisions on their own. Despite the presence of new technology, most organizations still use clipboards and paper to gather data and information. In many cases up to 90% of this data ends up stranded in the field in silos or islands. That presents challenges when trying to realize the benefits of Industry 4.0.

The good news is new technology can help, and there are several simple steps users can take to prepare for digital transformation, including:

  • Getting access to more data
  • Edge computing
  • Data cleansing
  • Contextualizing data
  • Standardizing common data structures.

The journey starts with getting access to more data — a vital component of Industry 4.0. The operational world is complex, involving hundreds of different protocols, communication mediums and legacy device knowledge. The reality of digital transformation is it must be implemented from the bottom up, with operations technology (OT) on board first. It requires a new mentality in keeping systems open, interoperable and secure. The first step is getting access to all the data in an efficient way — with the ability to easily tap into the data when needed, from any source.

One of the biggest barriers to data access is legacy software licensing models that charge per tag or user. These models don’t scale, prohibiting growth. Furthermore, industrial applications have been closed, proprietary and have limited functionality and connectivity. Today, we require new models that are fundamentally unlimited and open. These new models can unlock new opportunities for expansion and greater scalability.

Another challenge is balancing the convergence of new smart sensors and devices along with existing legacy devices. It’s important to have an infrastructure able to support both. It boils down to a single, crucial concept: an architecture change. We need to stop connecting legacy devices to applications with protocols and, instead, connect devices to infrastructure. We also need to provide an OT solution that meets the needs of operators that is plug-and-play, reliable, and scalable.

The new architecture

This new architecture uses message queuing telemetry transport (MQTT), a publish/subscribe protocol that enables message-oriented middleware architectures. This is not a new concept in the information technology (IT) space; enterprise service bus (ESB) has long been used for integration of applications over a bus-like infrastructure. With MQTT, device data is published by exception to a MQTT server, either on the premises or in the cloud. Applications subscribe to the MQTT server to get data, which means there’s no need to connect to the end device itself.

MQTT provides several benefits:

  • Open standard/interoperable (OASIS standard and Eclipse open standard (TAHU))
  • Decouples devices from applications
  • Reports by exception
  • Requires little bandwidth
  • Transports layer security (TLS)
  • Remote-originated connection (outbound only; no inbound firewall rules)
  • Stateful awareness
  • Single source of truth
  • Auto-discovery of tags
  • Data buffering (store and forward)
  • Plug-and-play functionality.

To get to a new architecture, the answer is edge computing and protocol conversion. Let’s say there are 10 Modbus devices connected to a supervisory control and data acquisition (SCADA) system. Users can deploy one edge gateway with support of Modbus and MQTT to push the polling closer to the programmable logic controller (PLC). Users can poll more information, potentially at faster rates, and publish the values as they change to a central MQTT server. The SCADA also can be changed to connect and subscribe to the MQTT server to get the data instead of connecting to the end devices.

This is an important step for future-proofing a SCADA system. As users acquire sensors or upgrade equipment that supports MQTT, the SCADA will get access to the data without having to know about the end device.

Help systems understand the data

Not only do users need to get access to the data, but they also need to ensure the data is valid, has context and is part of a common structure, if applicable. This is an important step before using analytics and machine learning. These systems need to understand the data in order to properly use it. Typically, new sensors and devices already have these facilities. However, that is not true for legacy devices. There are hundreds of different polling protocols that require mapping and scaling. Most PLCs have addressing schemes that are not human-readable. These mappings commonly exist in SCADA, but still lack context or may contain invalid data or are not part of a standard data structure.

The best place to handle this step is in the edge gateway that connects to the PLC. It requires software that has features in place to clean data, add context and support data structure.

Let’s start with cleaning data. Suppose there’s a sensor connected to the PLC and the signal drops out sometimes. When the signal drops out, the value in the PLC drops to 0. It may be possible for the value to equal 0, however, not when the last value was 50. In this case, it’s important to look at the delta to determine whether or not we should ignore the current value. Setting up a calculated tag with that logic can solve this problem. It’s important to ensure the data is valid closest to the source before using it with other systems.

Another crucial step is providing context to the data. For example, a user can have a Modbus PLC with a tag referencing 40001. In SCADA, we would map that to a tag name like “Ambient Temperature.” If that’s the only data we have, we don’t know if the temperature is Celsius or Fahrenheit and what the low and high range is. Analytics and machine learning models will provide incorrect data without the proper context.

Using edge gateways with the ability to provide name, scaling, engineering units, engineering low and high, documentation and tooltips will provide other systems crucial information to better understand the underlying data.

Increasing value to the enterprise

The last step is standardizing on common data structures across the enterprise. This step often gets skipped because data can be different at each site and it can be difficult to find a common data model. Analytic packages and machine learning models require data to be in the same structure for common objects. Users don’t want to have to create different analytics or machine learning models for each site. This goes beyond an individual data point to a collection of data points for a known object.

It’s important to survey each site to find a common model and use an edge gateway that supports user defined types (UDTs). This means adapting the data at each site to fit the model, which can include scaling, calculated tags, conversions and more. This is so the data appears as the same structure on the surface while hiding the complexity behind the scenes.

The journey begins with operational infrastructure and solving the problem of getting data into an infrastructure. Users can’t get to analytics and machine learning until they have access to data. This data needs to be valid and have context to be understood. Users can take small steps to realize the potential benefits of these technologies by adopting this new mentality and architecture.

Travis Cox is co-director of sales engineering at Inductive Automation. Edited by Chris Vavra, associate editor, Control Engineering, CFE Media and Technology,


Keywords: message queuing telemetry transport, Big Data, MQTT

Big Data acquisition and analysis to turn data into actionable information is a major aspect of Industry 4.0.

Current architectures use message queuing telemetry transport (MQTT) to help understand the data.

Users need constant access to the data being generated and it needs to have a valid context.

Consider this

What are you doing to streamline data acquisition to better prepare for Industry 4.0?

Author Bio: Co-director of sales engineering, Inductive Automation.