Integrating SQL databases and SCADA to maximize efficiency and reliability

Structured query language (SQL) can help SCADA software users improve their connectivity and ability to send and receive data.

Courtesy: Inductive Automation

Structured query language (SQL) insights

Structured query language (SQL) can help supervisory control and data acquisition (SCADA) users get a better handle on managing their data.
SQL database is simpler and easier to use than other databases and can give users more freedom in how they manage and handle their data.

Using structured query language (SQL) for automation is nothing new. SQL databases have been working in the background of many different systems for decades, but even today there are some who are hesitant to mix SQL with industrial automation software like supervisory control and data acquisition (SCADA). However, as more SCADA software users feel the pressure of keeping up in today’s connected, data-driven world, SQL has received more well-deserved attention.

A database can often be one of the most important components of any control system. Over many years of development, it has become increasingly clear that control systems integrated with databases that use open standards (such as SQL) have a major advantage over systems that use proprietary databases, which is why the combination of SQL and an industrial application platform is so valuable.

The SQL programming language

According to the American National Standards Institute (ANSI), SQL is the standard programming language for managing data in a relational database management system (RDBMS) and the one of the most widely used languages for relational databases. It was first developed for IBM in the 1970s and has been used consistently for almost 50 years to store and manage data in SQL databases.

Its main function is to quickly and efficiently ask questions of small to very large amounts of data using SQL statements, or “queries,” which allow users to store, update, retrieve, search or remove specific data in a database.

Software engineers construct SQL queries using various components of the SQL language, such as identifiers, variables, and search conditions, to form the correct statement. They can also save SQL queries for later use as “stored procedures” so users don’t have to rewrite frequently used queries. These capabilities are part of what makes SQL so efficient; even proprietary databases often use the same basic structure.

SQL relational databases

While all databases store data for future or continuous use, there are many types of databases, which are differentiated by their structure. The most widely used type are called relational databases. In a relational database, each data point is related to its surrounding data, and its function is to organize and retrieve relational data.

“SQL databases are one of the most important technologies that are used for data storage and access retrieval,” said Inductive Automation chief technology architect Kevin McClusky.

Users can think of a SQL database as having two main functions: access retrieval, which is performed using SQL; and storage, which is performed using the database engine. SQL databases organize data into files and specific storage areas on a physical hard disk or online servers in a cloud-based SQL database.

A SQL database is structured similarly to Excel tables with rows, columns, and cells, which makes organizing and retrieving any amount of data easy. This structure is especially useful for real-time and time-series (or historical) data. It helps to organize data that’s related to other data into tables by commonality, showing a picture of an event or data set.

One of the most powerful aspects of a SQL database is its ability to “query” or ask questions of data and establish relationships with it. SQL queries allow users to perform analysis, answer complex questions, and manipulate and aggregate data without many steps so the data can be retrieved from multiple tables, associate records with common field types, leverage the relationships within the data and control precisely where data goes and what data is excluded.

For example, a company could have two different tables for their employees — one for their personal data, such as name, address, and birth date, and another table that contains details about each employee’s salary. While the employees and salaries tables exist separately within the database, their data can be associated with one another, and potentially retrieved together. This association is accomplished by establishing a unique identifier (commonly known as a primary key) in one table that can be referenced in the other table. For example, the employee’s unique ID number could be a primary key in the employees table, and that ID could be used as a foreign identifier (commonly known as a foreign key) in the salaries table. This allows the data for an employee in the employees table to be associated with their salary data in the salaries table, all through the single value of their unique employee ID. The ability to create this association, or relation, is the reason these types of databases are referred to as relational databases.

By creating these relationships across tables, queries can then be written to summarize aggregate results from multiple tables such as employee salaries by department. If the SQL database is connected to the plant floor, users could find the average temperature of a room or several rooms, when or if the temperature control machines went down or how long they worked for, the standard deviation between datasets from multiple downtime events, and much more, in real time.

SQL queries can range from being simple with just one or two lines of code querying a single table, to extremely complex with hundreds of lines of code querying many tables together. Generally though, the shorter and succinct the query is, the better. “The queries are really only as complex as you need them to be,” McClusky said. The range and flexibility of SQL queries is part of what makes relational SQL databases a powerful and versatile tools for data retrieval, storage, and management.

Four SQL benefits

Many technology trends come and go every year. Yet after nearly five decades, SQL is still going strong. Here are four principal reasons behind SQL’s continued prominence: simplicity, scalability, accessibility, and its variety of options.

1. Simplicity

SQL is highly developed and very easy to learn. There are numerous free resources for learning SQL, and finding an IT person or engineer familiar with SQL is usually not a problem since so many use it. Besides, users only need to understand a few basic actions to retrieve, update, remove, or aggregate data.

2. Scalability

SQL databases are not overly expensive, unlike many proprietary databases, and can handle a large amount of data, making them easily scalable for any business. Some of the most popular SQL databases are even free to use, like MySQL and Postgres.

3. Accessibility

Open-standard technologies like SQL databases are available to everyone and easily accessible, unlike proprietary technologies, which are usually behind a high paywall, and often difficult to extract or migrate data from. Users also can integrate SQL databases with many other technologies, and don’t have to stick to a single vendor and use a limited custom-built database.

4. Many options

“One other reason why I think it’s still going so strong is that there are a lot of different options. SQL is a standard query language that multiple different databases provide for you, and you don’t have vendor lock-in with SQL databases. You’re not forced to use one type of SQL database. You have many options,” McClusky said.

What About the SQL Alternatives?

At the beginning of the 21st Century, combining SQL databases with a historian was uncommon, which led to issues when dealing with extremely large amounts of data. To help with this scalability problem while continuing to use some of the basic structures of the SQL database, software engineers developed NoSQL in 1998 and then NewSQL in 2011. However, the scalability issues that NoSQL and NewSQL set out to solve are much rarer now that storage space is more affordable and because you can couple a SQL database with a historian to create a high-performance database. NoSQL has grown in popularity in recent years. MongoDB and Amazon Web Services’ DynamoDB are currently two of the most popular NoSQL databases.

Collecting and storing time-series data

Another major benefit of SQL databases is that they help users get more value out of the organization’s time-series data. Time-series data is a sequence of data points that are time-stamped at successive intervals. These data points can be stored, typically in a database, and then retrieved to display in tables or charts that show any type of time-progressive data, such as temperatures, weights, volumes, and more.

Some think that time-series data is only useful for determining the status of individual data points at a particular point in time, tracking specific trends, and not much else. However, time-series data is actually one of the most valuable products that comes from monitoring machines and processes connected to a SCADA system. Time-series data can potentially help save millions of dollars in downtime, to make more detailed plans for improving energy efficiency, and a lot more.

However, many organizations don’t collect time-series data because they either don’t have the means to collect it, and/or an application to store and retrieve it. There are basically three options for storing time-series data: in programmable logic controllers (PLCs) in proprietary process historians or in SQL databases.

Storing data in PLCs is not a good solution. The more data a PLC stores, the less efficient it becomes, inevitably slowing down the overall operation. It is also difficult to retrieve data stored in a PLC if there is no connection to a database because users have to physically plug in to retrieve it.

Process historians are better than PLCs, but they can be very expensive. They also are not as versatile as SQL databases and don’t allow users to easily upgrade their database.

“You might find a great solution that’s out there, but that solution is just inside this locked-in ecosystem that is from a specific vendor, and that’s not going to do you any favors in the future when you come to your next upgrade cycle,” McClusky said.

SQL databases are typically the best choice for storing time-series data. They are well-equipped to handle large amounts of time-series data for SCADA systems, and they enrich time-series data by putting it into a simple format the entire organization can read.

Many large enterprise systems have used SQL databases for years. “Those databases are sometimes IT databases that are managed and have information from other sections of the company overall. Sometimes those are tied to enterprise resource planning (ERP) software,” McClusky said. Customer relationship management (CRM) and MES software also have a long history of using SQL databases. By connecting a SCADA system or application to a SQL database, users can share this information quickly and efficiently and make the data more accessible to other systems.

Once in an SQL database, time-series data can tell multi-layered stories about every aspect of their production, or reveal unseen trends that can save the organization a fortune. By overlaying this information, users can see how each system affects the other in real time and find new ways to improve efficiency and increase profitability.

For example, let’s say a sheet metal manufacturer notices one of the production lines is running out of material faster than another, even though both lines are working fine. After checking the SQL databases connected to the material inventories for each line and find that they receive the same amount of raw material. Then the user checks the SQL database connected to the SCADA system collecting time-series data from the machines’ PLCs and sensors.

After comparing the two production lines, the user notices one produces sheet metal a few micrometers thicker than the other. It could be a measurement problem with the machine, the machine could be worn out and in need of replacement, or maybe it wasn’t set up correctly. Even if the extra few micrometers are within an acceptable range, fixing the problem will save a large amount of inventory and capital for the future — all as a result of storing the time-series data in easily accessible SQL databases.

Connecting OT and IT

At this point, SQL may sound like a dream come true, but not everyone in the automation industry was originally on board with SQL databases. OT professionals, and SCADA users in particular, were skeptical of using SQL for time-series data, preferring proprietary databases specifically designed for the task. However, IT professionals typically view proprietary databases with their proprietary code and code translators as another hurdle to getting at valuable data and communicating effectively with other parts of the business.

It’s never easy to get experts on opposite sides of a problem to see eye to eye, but fortunately, tools like Ignition and SQL can get both sides of the OT-IT divide working together again. Connecting the SCADA system to a SQL database is a big step toward a mutual understanding between IT and OT departments.

“The SCADA system is going to acquire that data. It needs somewhere to put that data, and if you put it in a SQL database, you have all the advantages of the SQL database,” McClusky said.

By integrating these technologies, there would be no need to worry about connecting proprietary databases that could cause problems and hinder the rest of the system. In addition, most IT staff are familiar with SQL, so for those modernizing their SCADA systems with SQL databases and industrial automation platforms such as Ignition, it’s usually not necessary to teach staff how to use it.

Although some reluctance to pair SQL with SCADA remains, SQL and other open standard technologies are becoming much more popular.

“I’d say for folks who are modernizing their infrastructures and used to have an older setup with their manufacturing and they’re moving over to Ignition, absolutely SQL is increasing and it should be increasing,” McClusky said. This is because SQL databases are compatible with any new technology or application that uses these open standards. Integrating SQL databases and SCADA offers many advantages as technology is changing very quickly, and SQL can keep up with the demand because of its interoperability.

– Inductive Automation is a CFE Media and Technology content partner.

Structured query language (SQL) insights

The SQL programming language

SQL relational databases

Four SQL benefits

1. Simplicity

2. Scalability

3. Accessibility

4. Many options

What About the SQL Alternatives?

Collecting and storing time-series data

Connecting OT and IT

related topics

you might also like

Gain new insights on SCADA trends, applications

The future of automation lies beyond a system-of-systems approach

How to manage data collection and distribution on a unified platform

How to achieve flexibility with platform-agnostic SCADA

Get the newsletter