Get ready for big data by getting the architecture right

Managing big data can be an issue for many manufacturers and other companies with a strong online presence that needs to be integrated into production facilities. This is easier when multi-threaded applications are introduced to the system. Use these architectures and open-source technologies used by Amazon and others to improve big data gathering and analysis.

By Dennis Brandl April 21, 2015

To prepare manufacturing sites for new big data requirements apply the same architectures and open source technologies used by Amazon and others, creating a big data system for gigabytes of unstructured production data. Big data is not just something that web-based companies need to deal with. Additional product tracking, tracing, and investigations requirements are adding big data requirements to production facilities. Fortunately, manufacturing has already learned how to handle big data by using data historians. Data historians can keep years of tagged data and are a vital part of any production facility. However, the data in data historians are simple, usually just a tag ID, a value, and a status. The big data architecture required for tracking, tracing, and investigations is often more complex, comprised of pictures, unformatted text, formatted text, and other types of unstructured data.

One recent example of a production big data project was discussed at a recent ISA FPID Symposium. It consisted of a new automated production line for a regulated device. There are 12 image files being collected at various stages of assembly per assembled device per second. The image files must maintain batch and lot context, be maintained for multiple years, and be searchable with access to any image file in less than 1 second to allow for investigations, recalls, and audits. There would be more than 1 billion image files that must be managed for one production line, and more lines are planned for the future. This situation is outside the scope of the built-in capabilities of the file system, databases, and commercial tools, so an innovative solution was used. 

Internet innovation has the answer

The Internet is known for innovation and, fortunately, innovation can be applied to manufacturing big data problems. The same free and open source systems and data handling architectures used by web applications to search through hundreds of thousands of products or millions of blog posts in less than a second can be used for manufacturing big data. Using a multi-threaded application a working prototype was developed in a week, and a full system was deployed within two months. General purpose IT solutions can be effectively used in manufacturing, but one vital element cannot be missed. The system must have a robust and well-defined architecture. Specifically on any big data project, the architecture is critical to its success. Too many industrial applications are hacked together, resulting in systems that don’t meet the performance needs, are not scalable, and are not supportable.

Multi-threaded applications also are a good architecture model to use when different parts of the system have varied performance characteristics. Multi-threaded systems take advantage of the fact that most CPUs spend a lot of time waiting for memory fetches. During the waiting time the CPU can be operating on other threads’ pre-fetched data, increasing the overall system performance. This big data project’s architecture used one thread to identify new images, another to copy image files to a file server, and another set of threads to create the indexes of the files and associate them to the correct batch and lot. Each of these actions could be running at different cycles, and a multi-threaded system with queues ensures that no data is lost.

Timing is everything

Multi-threading concepts should only be introduced where brute force computing cannot keep up with data throughput requirements. However, with careful planning and by using a robust thread-safe architecture, a big data collection and indexing system can be built to handle gigabytes of unstructured records with sub-second retrieval times. By applying the same architectures and open source technologies used by Amazon and others, a big data system can handle gigabytes of unstructured production data.

– Dennis Brandl is president of BR&L Consulting in Cary, N.C., His firm focuses on manufacturing IT. Edited by Eric R. Eissler, editor-in-chief, Oil & Gas Engineering,

ONLINE extra

At Home, search Brandl for more on related topics.

See other Manufacturing IT articles