Resolve five Big Data, data acquisition challenges

Demystify the need for Big Data and five related challenges: data structure, scalability, integration, storage, and upgrades.

By Qasim Maqbool and Ahmed Habib March 10, 2019

The term Big Data has been widely used from retail management systems focused on staple goods stores to enterprise data management solutions for multinational organizations. Everyone seems to be insistent on employing Big Data tools and techniques, which includes industrial applications.

Find greater success by demystifying the need for Big Data and five main accompanying challenges.

1. Data types, structure

While just about everyone in the manufacturing industry today has heard the term “Big Data,” what Big Data exactly constitutes is a tad more ambiguous. At a glance, Big Data is the all-encompassing term for traditional data and data generated beyond those traditional data sources.

In a plant’s context, this traditional data can be split into two streams: Operational technology (OT) data and information technology (IT) data. OT data for a plant consists of alarms and events data, data historian collections, etc. IT data for the plant is enterprise resource planning (ERP) data, which primarily covers production, procurement, and access logs.

Big Data includes and goes beyond IT and OT structured and periodically stored data. While traditional IT and OT data is stored in its own unique systems and structures, Big Data is “multistructured,” meaning it has the necessary knowledge management tools to access different data from different origins and contextualize it for analyses and reports. A major milestone for an effective industrial Big Data system is integrating IT/OT data.

It doesn’t stop there, though. A plant has numerous other potential data points that aren’t monitored in any specific system such as shift logs, personnel reports, and audits. There’s also data such as machine vibrations, planning inefficiencies, and environmental variables that isn’t monitored. Addressing this unmonitored data and including its impact on potential decisions is another important consideration for Big Data systems.

2. Data scalability, context

That doesn’t mean Big Data is only concerned with data storage and acquisition. Big Data systems need to be able to quickly address and analyze data on demand without being affected by the scale and pace of data acquisition and querying. This is called Big Data scalability and it is one of the first concerns for Big Data systems. Other concerns include system reliability–the ability to always provide similar performance–and decision support for real-time analyses.

These analyses also include machine learning (ML) and artificial intelligence (AI), which can be beneficial in picking out data anomalies, predicting future behavior for production, equipment, and forecasts, and providing detailed scenarios for decision support.

The best part is Big Data systems are designed to perform most of these analyses on real-time data–using simpler algorithms to pick datasets that need more analyses–regardless of the scale and speed of data ingestion.

3. Data integration efficiency

Once the context for Big Data has been established, determining the need for it becomes a relatively simpler task. While most integrators and solution providers will tell say Big Data is needed–this claim is generally true–when and how it’s needed is a more nuanced matter. Big Data integration isn’t something anyone can jump right into. it requires an extensive effort and commitment from the entire organization, not just the IT team implementing it.

At the granular level, there are information pockets that are either invisible to the organization or sometimes intentionally kept secret to avoid a “Hawthorne effect” (when observation can change behavior). Not taking these factors into consideration can sometimes mean the difference between investing a million dollars and saving millions more from it.

For a plant operator looking to upgrade a distributed control system (DCS), integrating a Big Data system for a more holistic view of the plant is a task beyond the plant team’s budget and the scope of one plant.

For situations where the current need is to enhance the speed and capability of data collection using newer technology such as employing a data historian with real-time tracking, trending, and monitoring with the ability to perform analyses, it makes little sense to employ a Big Data solution for such a limited scope.

4. Data storage

The primary consideration for all upgrades needs to be the desired results. Consider a scenario where the desired data needs to be locally stored–for security or privacy considerations–the existing data is already stored structurally and only needs to be analyzed and contextualized with other data such as through an OPC server.

An implementation based on remote analysis and visualization using dashboards and data connectors is more effective financially and for the organization for implementing such a solution while still delivering similar decision support as a Big Data implementation.

Many industries are at a stage where connecting disparate data sources and giving analyses and insights on that data is a means to achieve multifold boosts in productivity and efficiency.

However, that doesn’t mean Big Data solutions don’t fit into the picture. The early Big Data adopters will soon have enough of a competitive edge over those relying on integrated traditional data analyses. However, most industries, as they stand today, need traditional data analyses as much as Big Data.

5. Big Data upgrades

Some of the biggest challenges of Big Data come in the form of planning a Big Data upgrade. An extensive solution that can be continuously scaled to integrate newer data sources needs to be designed for future inclusions and upgrades without affecting any functionality and performance.

For most organizations, this means switching their services to the cloud, upgrading their systems across the board for better monitoring and logging of data, and almost always increasing the human capital that possesses the skill and capability to implement Big Data solutions across all departments and functions.

Organizations that employ on-premise solutions for security or other concerns also need to consider the higher costs of maintaining in-house data servers with a dedicated system support team. Even then, the scalability of these systems isn’t always as effective as cloud-based deployments.

Companies working with Industrial Internet of Things (IIoT) and Big Data solutions have a vested interest in pushing Big Data solutions. While it is always going to be more beneficial to have a Big Data solution in the long term, if the existing systems have gaps that can be filled with a better organized approach to traditional data management and analyses, it makes a lot more sense to implement traditional data acquisition, trending, and monitoring, which would have much greater cost-benefits as well.

If anything, a thoroughly planned traditional data analysis technologies need to be a precursor to implementing a Big Data solution. Only then can an organization see what Big Data systems are capable of achieving.

Qasim Maqbool is principal platform engineer industrial intelligence solutions, and Ahmed Habib is marketing manager, for Intech Process Automation. Intech Process Automation is a Control Engineering Content Partner. Edited by Mark T. Hoske, content manager, Control Engineering, CFE Media,

KEYWORDS: Big Data, data acquisition

Challenges exist with Big Data and data acquisition.

Data structure and scalability need to be addressed.

Data integration, storage, and upgrades are important.


Breaking down challenges related to Big Data can make implementations less daunting.

Author Bio: Qasim Maqbool is principal platform engineer industrial intelligence solutions, and Ahmed Habib is marketing manager, for Intech Process Automation