Get ready for big data by getting the architecture right

Managing big data can be an issue for many manufacturers and other companies with a strong online presence that needs to be integrated into production facilities. This is easier when multi-threaded applications are introduced to the system. Use these architectures and open-source technologies used by Amazon and others to improve big data gathering and analysis.


To prepare manufacturing sites for new big data requirements apply the same architectures and open source technologies used by Amazon and others, creating a big data system for gigabytes of unstructured production data. Big data is not just something that web-based companies need to deal with. Additional product tracking, tracing, and investigations requirements are adding big data requirements to production facilities. Fortunately, manufacturing has already learned how to handle big data by using data historians. Data historians can keep years of tagged data and are a vital part of any production facility. However, the data in data historians are simple, usually just a tag ID, a value, and a status. The big data architecture required for tracking, tracing, and investigations is often more complex, comprised of pictures, unformatted text, formatted text, and other types of unstructured data.

One recent example of a production big data project was discussed at a recent ISA FPID Symposium. It consisted of a new automated production line for a regulated device. There are 12 image files being collected at various stages of assembly per assembled device per second. The image files must maintain batch and lot context, be maintained for multiple years, and be searchable with access to any image file in less than 1 second to allow for investigations, recalls, and audits. There would be more than 1 billion image files that must be managed for one production line, and more lines are planned for the future. This situation is outside the scope of the built-in capabilities of the file system, databases, and commercial tools, so an innovative solution was used. 

Internet innovation has the answer

The Internet is known for innovation and, fortunately, innovation can be applied to manufacturing big data problems. The same free and open source systems and data handling architectures used by web applications to search through hundreds of thousands of products or millions of blog posts in less than a second can be used for manufacturing big data. Using a multi-threaded application a working prototype was developed in a week, and a full system was deployed within two months. General purpose IT solutions can be effectively used in manufacturing, but one vital element cannot be missed. The system must have a robust and well-defined architecture. Specifically on any big data project, the architecture is critical to its success. Too many industrial applications are hacked together, resulting in systems that don't meet the performance needs, are not scalable, and are not supportable.

Multi-threaded applications also are a good architecture model to use when different parts of the system have varied performance characteristics. Multi-threaded systems take advantage of the fact that most CPUs spend a lot of time waiting for memory fetches. During the waiting time the CPU can be operating on other threads' pre-fetched data, increasing the overall system performance. This big data project's architecture used one thread to identify new images, another to copy image files to a file server, and another set of threads to create the indexes of the files and associate them to the correct batch and lot. Each of these actions could be running at different cycles, and a multi-threaded system with queues ensures that no data is lost.

Timing is everything

Multi-threading concepts should only be introduced where brute force computing cannot keep up with data throughput requirements. However, with careful planning and by using a robust thread-safe architecture, a big data collection and indexing system can be built to handle gigabytes of unstructured records with sub-second retrieval times. By applying the same architectures and open source technologies used by Amazon and others, a big data system can handle gigabytes of unstructured production data.

- Dennis Brandl is president of BR&L Consulting in Cary, N.C., His firm focuses on manufacturing IT. Edited by Eric R. Eissler, editor-in-chief, Oil & Gas Engineering,

ONLINE extra

At, search Brandl for more on related topics.

See other Manufacturing IT articles

No comments
The Engineers' Choice Awards highlight some of the best new control, instrumentation and automation products as chosen by...
The System Integrator Giants program lists the top 100 system integrators among companies listed in CFE Media's Global System Integrator Database.
The Engineering Leaders Under 40 program identifies and gives recognition to young engineers who...
This eGuide illustrates solutions, applications and benefits of machine vision systems.
Learn how to increase device reliability in harsh environments and decrease unplanned system downtime.
This eGuide contains a series of articles and videos that considers theoretical and practical; immediate needs and a look into the future.
Make Big Data and Industrial Internet of Things work for you, 2017 Engineers' Choice Finalists, Avoid control design pitfalls, Managing IIoT processes
Engineering Leaders Under 40; System integration improving packaging operation; Process sensing; PID velocity; Cybersecurity and functional safety
Mobile HMI; PID tuning tips; Mechatronics; Intelligent project management; Cybersecurity in Russia; Engineering education; Road to IANA
This article collection contains several articles on the Industrial Internet of Things (IIoT) and how it is transforming manufacturing.

Find and connect with the most suitable service provider for your unique application. Start searching the Global System Integrator Database Now!

SCADA at the junction, Managing risk through maintenance, Moving at the speed of data
Flexible offshore fire protection; Big Data's impact on operations; Bridging the skills gap; Identifying security risks
The digital oilfield: Utilizing Big Data can yield big savings; Virtualization a real solution; Tracking SIS performance
click me