Get ready for big data by getting the architecture right

Managing big data can be an issue for many manufacturers and other companies with a strong online presence that needs to be integrated into production facilities. This is easier when multi-threaded applications are introduced to the system. Use these architectures and open-source technologies used by Amazon and others to improve big data gathering and analysis.


To prepare manufacturing sites for new big data requirements apply the same architectures and open source technologies used by Amazon and others, creating a big data system for gigabytes of unstructured production data. Big data is not just something that web-based companies need to deal with. Additional product tracking, tracing, and investigations requirements are adding big data requirements to production facilities. Fortunately, manufacturing has already learned how to handle big data by using data historians. Data historians can keep years of tagged data and are a vital part of any production facility. However, the data in data historians are simple, usually just a tag ID, a value, and a status. The big data architecture required for tracking, tracing, and investigations is often more complex, comprised of pictures, unformatted text, formatted text, and other types of unstructured data.

One recent example of a production big data project was discussed at a recent ISA FPID Symposium. It consisted of a new automated production line for a regulated device. There are 12 image files being collected at various stages of assembly per assembled device per second. The image files must maintain batch and lot context, be maintained for multiple years, and be searchable with access to any image file in less than 1 second to allow for investigations, recalls, and audits. There would be more than 1 billion image files that must be managed for one production line, and more lines are planned for the future. This situation is outside the scope of the built-in capabilities of the file system, databases, and commercial tools, so an innovative solution was used. 

Internet innovation has the answer

The Internet is known for innovation and, fortunately, innovation can be applied to manufacturing big data problems. The same free and open source systems and data handling architectures used by web applications to search through hundreds of thousands of products or millions of blog posts in less than a second can be used for manufacturing big data. Using a multi-threaded application a working prototype was developed in a week, and a full system was deployed within two months. General purpose IT solutions can be effectively used in manufacturing, but one vital element cannot be missed. The system must have a robust and well-defined architecture. Specifically on any big data project, the architecture is critical to its success. Too many industrial applications are hacked together, resulting in systems that don't meet the performance needs, are not scalable, and are not supportable.

Multi-threaded applications also are a good architecture model to use when different parts of the system have varied performance characteristics. Multi-threaded systems take advantage of the fact that most CPUs spend a lot of time waiting for memory fetches. During the waiting time the CPU can be operating on other threads' pre-fetched data, increasing the overall system performance. This big data project's architecture used one thread to identify new images, another to copy image files to a file server, and another set of threads to create the indexes of the files and associate them to the correct batch and lot. Each of these actions could be running at different cycles, and a multi-threaded system with queues ensures that no data is lost.

Timing is everything

Multi-threading concepts should only be introduced where brute force computing cannot keep up with data throughput requirements. However, with careful planning and by using a robust thread-safe architecture, a big data collection and indexing system can be built to handle gigabytes of unstructured records with sub-second retrieval times. By applying the same architectures and open source technologies used by Amazon and others, a big data system can handle gigabytes of unstructured production data.

- Dennis Brandl is president of BR&L Consulting in Cary, N.C., His firm focuses on manufacturing IT. Edited by Eric R. Eissler, editor-in-chief, Oil & Gas Engineering,

ONLINE extra

At, search Brandl for more on related topics.

See other Manufacturing IT articles

No comments
The Engineers' Choice Awards highlight some of the best new control, instrumentation and automation products as chosen by...
The System Integrator Giants program lists the top 100 system integrators among companies listed in CFE Media's Global System Integrator Database.
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
This eGuide illustrates solutions, applications and benefits of machine vision systems.
Learn how to increase device reliability in harsh environments and decrease unplanned system downtime.
This eGuide contains a series of articles and videos that considers theoretical and practical; immediate needs and a look into the future.
Controller programming; Safety networks; Enclosure design; Power quality; Safety integrity levels; Increasing process efficiency
Additive manufacturing benefits; HMI and sensor tips; System integrator advice; Innovations from the industry
Robotic safety, collaboration, standards; DCS migration tips; IT/OT convergence; 2017 Control Engineering Salary and Career Survey
Featured articles highlight technologies that enable the Industrial Internet of Things, IIoT-related products and strategies to get data more easily to the user.
This article collection contains several articles on how automation and controls are helping human-machine interface (HMI) hardware and software advance.
This digital report will explore several aspects of how IIoT will transform manufacturing in the coming years.

Find and connect with the most suitable service provider for your unique application. Start searching the Global System Integrator Database Now!

Infrastructure for natural gas expansion; Artificial lift methods; Disruptive technology and fugitive gas emissions
Mobility as the means to offshore innovation; Preventing another Deepwater Horizon; ROVs as subsea robots; SCADA and the radio spectrum
Future of oil and gas projects; Reservoir models; The importance of SCADA to oil and gas
Automation Engineer; Wood Group
System Integrator; Cross Integrated Systems Group
Jose S. Vasquez, Jr.
Fire & Life Safety Engineer; Technip USA Inc.
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
click me