Drowning in data, starved for information
“Big data” is a big thing in information technology. Companies are collecting big data and analyzing it using sophisticated analytical tools to find hidden patterns and correlations. The shopping giant, Target, uses big data and analytics to determine when its customers have life changes, such as weddings, babies, and grandchildren, and then sends them individually targeted advertisements. The America’s Cup winning Oracle Team USA used big data, over a GigaByte of data per typical training run, to continuously fine-tune their sailboat for changing wind and sea conditions during each race.
Wouldn’t it be great to be able to determine when manufacturing systems have significant changes, such as failing equipment, changed raw material properties, or energy and labor costs, and to automatically determine the best operating conditions? That is the appeal of big data in manufacturing, and why manufacturing companies are starting big data projects for their production facilities.
Manufacturing companies have been collecting big data for years, usually in-plant data historians that collect time and tag data every few seconds. A large company may collect over 5 Gigabytes per week from its production facilities. This is a lot of data, but the key to big data is pulling the nuggets of information out of the data using data analysis tools. Five Gigabytes per week of raw data has little value unless there are ways to easily and quickly analyze the data and use it to determine correlations, causes, and corrective actions.
If you are starting a big data project, there are a few important facts to remember.
1. Data needs context
The first is that data without context has limited value. In the manufacturing world, context is provided by either the job or recipe step being executed. Each piece of data must also be associated with the job being executed or product being produced and with all associated quality measures for the job. This context allows job-to-job comparisons to detect significant changes and correlation analysis to match the new conditions to know situations or previously determined setpoints. The first step in using manufacturing big data is to collect context or event information and link it to your plant historian. Fortunately, all major suppliers of plant historians provide event or context add-ons that can link manufacturing execution system (MES) workflows or recipe execution systems to the historian data.
2. Optimize for analysis
The second fact is that online historians are great tools for saving data, but not necessarily analyzing data. A good practice is to use off-line copies or a data warehouse for analysis. Most plant data historians are optimized for writing data, and extracting the large amounts of data needed for big data analysis can take an extremely long time on a running system. Just generating a simple report can take hours. A better strategy is to perform periodic backups of the historian data to an off-line system that is used for analysis or to consolidate the data into a data warehouse optimized for big data analysis. When using a data warehouse, you should investigate the use of a Hadoop framework for your analysis. Hadoop is a free Java-based programming framework that supports the processing of large data sets in a distributed computing environment. The Hadoop environment provides the means to investigate large data sets in a distributed or clustered environment and can be used to combine data from multiple sites into a single data view. See hadoop.apache.org for more information.
3. Consider sample size
The third fact to remember is that you can prove anything with the right statistics. To prove the right thing, make sure that your sample size is large enough to determine correlations and causations. For example, if you are looking for a correlation for a failed production run, you probably need at least seven good runs for comparison. A smaller sample size may indicate invalid correlations and send you down the wrong path for corrections. It is also important to not confuse correlation with causation. Just because a correlation was discovered does not mean that the events are causally related. Data analysis can discover correlations, but there is still an engineering task required to determine if one is the cause of the other. It is important to have engineers or scientists involved in a big data analysis project so that real causations can be determined based on an engineering analysis and that data was not cherry-picked to find a correlation. Unfortunately, statistical analysis is a slowly acquired skill, so even if there is no obvious bias in the data, always allow for a reasonable sense of skepticism about the results until they can be proven, and they are not being used to make an office politics point. A common saying in big data analysis is to make sure that the data is used for story-telling, not story-selling.
4. Empower people
The last fact to remember is that detecting patterns is something that people do naturally but is difficult to automate. You can use this to your advantage by empowering your people to search and discover in the data warehouse. Data analysis tools with visualization capability allow people to discover correlations that cannot be easily found with automated searches. Often, just having a different person look at the data or evaluate your analysis will bring out previously unseen correlations. Operational staff members often have deep knowledge of the production systems and relationships, and they can help discover hidden or non-obvious correlations.
Adding context to saved data, working in a data warehouse optimized for analysis, objective storytelling, adequate sample sizes, appropriate conclusions about correlation and causation, and empowered personnel are key elements of a manufacturing big data project. Ensure your projects use these practices to bring the value of big data analysis to the production floor.
– Dennis Brandl is president of BR&L Consulting in Cary, N.C. His firm focuses on manufacturing IT. Edited by Mark T. Hoske, content manager, CFE Media, Control Engineering and Plant Engineering, email@example.com.
This posted version contains more information than the print / digital edition issue of Control Engineering.
At www.controleng.com, search Brandl for more on related topics.
See other articles for 2013 at www.controleng.com/digitaledition.
See other Manufacturing IT articles.