Data Acquisition, DAQ

The data historian’s history told

Momentum around time-series data storage suggests a new chapter for a legacy offering.
By Michael Risse April 27, 2019
Courtesy: DB-Engines.com

The time-series data storage application market by any name ─ data historians, process historians, enterprise historians ─ may seem an aging, and boring, software market. One IT analyst company has process historians so far to the right on their customer adoption curve they are about to fall off into obsolescence. Yet, this least glamorous component of most human-machine interface (HMI), distributed control system (DCS) and supervisory control and data acquisition (SCADA) deployments refuses to fade away. Recent market trends demonstrate a new generation of offerings and revived interest in historian products.

Process manufacturers are installing more historians daily, and this spring OSIsoft, the leading manufacturing data platform vendor, will welcome thousands of attendees to its 29th user conference, PI World. In 2018, industry analyst firms ARC Advisory Group and Markets & Markets both published studies to meet customer demand for insights into the historian market.

What’s next for historians, and what’s driving the interest in and need for data storage systems specific to time-series data? To find the answers to these and other related questions, start by examining historian history.

The first act

If an official history of the historian application market exists, except in the tribal knowledge of those involved, it’s hard to find. But in the mid-1980s, first with Oil System Incorporated’s plant information systems ─ later renamed OSIsoft PI System ─ the notion of storing data for regulatory, reporting, asset availability and diagnostic (root cause) analytics got its start (Figure 1). These early historians ran on DEC VAX/VMS minicomputers ─ the manufacturing system of choice in the 1980s.

Figure 1: The concept of storing time-series data for regulatory, reporting, asset availability and analytics got its start back in the 1980s with OSIsoft and its founder Pat Kennedy. Photo courtesy: OSIsoft

Figure 1: The concept of storing time-series data for regulatory, reporting, asset availability and analytics got its start back in the 1980s with OSIsoft and its founder Pat Kennedy. Photo courtesy: OSIsoft

Coincidentally, introduction and development of historian software happened in parallel with that of its longtime partner in plant analytics, the spreadsheet. By the late 1980s, spreadsheets and time-series data storage both were established software categories, a connection that remains to this day, as spreadsheets are still the most commonly used analytics tool for historian users. The foundation of these early efforts was delivering the essential components of an enterprise software product, so they were evaluated on reliability, connectivity, scalability and compression or efficiency characteristics, much as they are today.

The second act

With demonstrated value from historians for data storage and analytics, the second stage for the historian market saw acquisition or development of a historian product by each of the leading automation vendors. A summary of all offerings would be an overwhelming exercise, but as examples, consider GE Fanuc acquiring Mountain Systems in 2003 (now GE Proficy), the early days of WonderWare in 1987 (now owned by AVEVA), Honeywell’s 2007 acquisition of InterPlant (now Professional Historian Database, or PHD), AspenTech’s InfoPlus 21, and too many others to count.

In fact, one of the challenges of the current historian market is trying to size it. Vendors like Siemens, ABB and AVEVA have multiple offerings from a history of acquisitions. AVEVA alone has WonderWare, eDNA and Citec historians, for example. Other vendors include a historian as a product feature, such as SCADA vendor Inductive Automation’s Ignition system. Adding to this, some companies have built their own historians, typically as a SQL server application.

So, how many historian products are out there today? Counting just commercial offerings, a conservative estimate is at least in the dozens, certainly over 30. These include products you might not know unless you’ve been to Norway (Prediktor) or Ireland (AutomSoft), or if you’re a student using SCADA systems like Cygnet, Ellipse, and CopaDATA.

Another characteristic of second act offerings was the development of historian applications for easier trending, reporting, mobile access and other user experiences. No longer was a historian database product enough to meet customer needs.

Instead, vendors added full platform functionality like application programming interfaces (APIs), high-value components such as OSIsoft PI System Asset Framework and PI Notifications, or portfolios of historian applications such as the Honeywell Uniformance Suite.

The third act

With each industrial vendor offering a historian product, an application stack of functionality, and a Windows Server offering ─ the market long since had left the VAX/VMS platform ─ the market settled down for an extended period. As mentioned, historians were an important but quiet software category, and for large automation vendors, historians were a tiny part of a massive automation systems business.

So what has changed in the last two years? A transition is underway to a third stage of the time-series data storage application business. For example, two of Silicon Valley’s leading venture capital firms, Benchmark Capital and Battery Ventures, have invested more than $125 million in new time-series storage companies InfluxDB and Timescale over the last two years.

As context, Benchmark has nearly $3 billion under management and was an early-stage investor in companies ranging from Twitter to Dropbox to Instagram. Meanwhile, Battery Ventures has nearly $7 billion in assets. These firms now consider the time-series storage market big enough or important enough to capture their interest. Investor enthusiasm also lifted OSIsoft, which received an investment from Softbank in 2017.

Further, Amazon announced a new time-series storage service, Timestream, on its Amazon Web Services (AWS) public cloud platform that will be available later in 2019. Siemens MindSphere and GE Predix offer similar functionality on their cloud platforms. What should end users expect as a roadmap?

Exploding data volumes

Many an industrial automation executive has said, “There is nothing new about the Industrial Internet of Things, we’ve been doing it for decades.” This is true, if you consider the generation, collection, storage and analysis of sensor data that has been a mainstay for the process manufacturing industry since the 1980s.

What has changed with IIoT, however, is how exploiting every stage of the data lifecycle has gotten less expensive over the last two decades by an order of magnitude, if not two. This includes data generation by sensors, data collection with networking and connectivity, and data storage.

So, if previously only the most critical assets were worth monitoring, over time less important assets have been included either in the plant network or through a complementary IIoT platform. These include disconnected or standalone assets.

The change, therefore, is not in architecture, except perhaps for wireless connectivity replacing wired, but in economics. And the economics absolutely are changed and are the principal driver behind a host of industry trends such as Big Data, Industrial Internet of Things (IIoT), wireless networking and cloud data services. Computing is so cheap users can justify generating, collecting and storing more data.

Figure 2: Time-series databases are experiencing explosive growth because they efficiently can store and provide access to large volumes of data. Figure courtesy: DB-Engines.com

Figure 2: Time-series databases are experiencing explosive growth because they efficiently can store and provide access to large volumes of data. Figure courtesy: DB-Engines.com

All this data must be stored somewhere, and whether the storage system is called a historian or not, time-series data storage options are exploding. Open source time-series databases, cloud-based services (like the AWS offering mentioned earlier), data lake vendors and cloud platform startups are all rushing to be the vendor of record in the new time-series data storage business (see Figure 2).

The cloud wants in

Vast data volumes attract attention because “data has gravity” ─ meaning data attracts high-value add-on services such as management, security, analytics and consulting. The result? Manufacturing, which generates twice as much data as the next leading vertical, has caught the attention of companies wanting in.

Microsoft, Amazon and Google specifically have focused on the oil & gas sector as a starting point for their efforts. Like the Battery and Benchmark VC investments, this is clearly a sign of new market interest.

It is also a sign of maturity in the public cloud market. If historians are about 35 years old, it’s interesting to note how old are the public cloud offerings. For example, Amazon brought out AWS in 2002, and then introduced S3 (storage) and EC2 (virtual machines) in 2006. Then cloud computing competition got interesting with Microsoft’s and Google’s cloud platform introductions in 2008.

With years of experience to get used to the idea of cloud data storage, moving data to the cloud is increasingly a question of “when” and not “if.” Consequently, the big public-cloud platforms are paying more attention to the largest sources of data.

Amazon with AWS will offer Timestream, mentioned earlier, and Google offers time-series storage documentation for its BigTable storage service. Microsoft also has multiple Azure services that can store time-series data already in market.

Therefore, the three largest public cloud vendors, in addition to GE and Siemens, have announced time-series storage services. This is in addition to the many IIoT platform offerings such as PTC ThingWorx, industrial vendor offerings such as OSIsoft Cloud Services, and industrial data platforms for contextualization and aggregation such as Cognite. Like the list of historian vendors, the list of companies, and especially startup companies, with data collection and storage services on the cloud is long.

Of course, data services by themselves don’t make a historian or time-series database product successful. Historian applications, asset model support, data connectors and other features are critical requirements for a solution, as opposed to a piecemeal, customer offering. What is clear, though, is that data volumes have attracted new players, and the storage location for an increasing amount of sensor data will be in the cloud versus on-premise.

Paths to insight

The relationship of historians and spreadsheets was mentioned earlier in the context of their parallel, and still shared, codependency. What does every historian product offer as a feature or product? The answer is a connector to spreadsheets for data cleansing, contextualization, calculations and modeling.

But with growing data volumes, there is increasing demand for improved analytics offerings: descriptive, predictive, diagnostic, interactive and prescriptive, which go well beyond the scope of spreadsheets. These improvements are being delivered via what many now refer to as advanced analytics.

Specifically, advanced analytics speak to the inclusion of cognitive computing technologies into the visualization and calculation applications. McKinsey defines advanced analytics solutions thus:

“[Advanced analytics solutions] … provide easier access to data from multiple data sources, along with advanced modeling algorithms and easy-to-use visualization approaches and could finally give manufacturers new ways to control and optimize all processes throughout their entire operations.”

The introduction of machine learning and other analytics techniques accelerates an engineer’s efforts when seeking correlations, clustering or finding any other needle within the haystack of process data. With these features built on multidimensional models and enabled by assembling data from different sources, engineers gain an order-of-magnitude improvement in analytics capabilities, akin to moving from pen and paper to the spreadsheet.

For example, the ability to “search like Google” across all the tags in a historian or other Big Data storage system is now available in some advanced analytics software, with other capabilities delivered in a similar manner.

There are two critical components to an advanced analytics approach.

First, it should be a self-service offering for the engineers who have the required experience, expertise and history with the plant and processes. This enables engineers to work at an application level with productivity, empowerment, interaction and ease-of-use benefits.

Figure 3: Advanced analytics software provides self-service capabilities for engineers to create various views of data. Image courtesy: Seeq Corp.

Figure 3: Advanced analytics software provides self-service capabilities for engineers to create various views of data. Image courtesy: Seeq Corp.

Second, the advanced analytics solution should include a connection between the analysis that is created and the underlying data set so users can simply click through and get to the underlying data. Advanced analytics offerings should be used to produce not just pictures of data in visualizations, but also to provide access to the analytics and sources that generated the outputs. Engineers, teams, managers and organizations can therefore use these new capabilities to enable the distribution of benefits throughout a plant and a company (see Figure 3).

Conclusion

With historian vendor incumbents and challengers, including open source and cloud services, the market for time-series data storage has taken a strong turn to the interesting and created a third act for the manufacturing time-series data storage market. Engineers with experience using historians will find their skills more marketable as more companies generate, collect and analyze sensor data. Advanced analytics solutions will be their key resource in a move away from their historical affinity with spreadsheets to more compelling software products.

This article appears in the IIoT for Engineers supplement for Control Engineering and Plant Engineering. See other articles from the supplement below.


Michael Risse
Author Bio: Michael Risse is a vice president at Seeq Corp. He has been a consultant with Big Data platform and application companies, and prior to that worked with Microsoft for 20 years. Michael is a graduate of the University of Wisconsin at Madison.