Mathematics of information
Information is the important element of manufacturing IT, and one skill that every well-rounded control engineer should have is knowledge of the mathematics of information— specifically, the mathematics of relational databases. Most control engineers will eventually have to build or specify a database to hold instrument data, analysis data, or production reporting data. This is the point when understanding the underlying structure and mathematics of databases becomes important.
The majority of databases today are “relational,” which means they are made up of multiple tables and relationships between the tables are based on common values within them, such as names or ID numbers. Well-structured databases are faster, smaller, and extremely easier to maintain that badly structured databases. Many IT consultants have found that they spend much more time fixing bad databases than creating new databases, demonstrating that the problem of bad database design is widespread and can be costly.
Well-structured databases are “normalized,” but the rules of normalization are often not known or consistently applied. The purpose of normalization is to eliminate redundant information, making it easier to create, read, update, and delete database information. The better normalized a database is, the less special application code or fewer special use rules are required.
Un-normalized databases will often have special use rules, such as “make sure you delete all of the rows in table ‘A’ that have the specified key value.” Normalizing a database is also a key process in ensuring that the database accurately meets its requirements. Normalization cannot be performed when the requirements are fuzzy or uncertain.
Minding your tuples
Relational database tables are made up of rows and columns. The column’s names are called attributes. The individual rows in a database are called tuples, a mathematical term for an ordered list of objects. One or more attributes in a row make up the “key” to the tuple. The primary rule for relational databases is that no two rows can have the same key attribute values. This sometimes requires the creation of an additional attribute to hold a unique ID.
The first rule of normalization is the removal of repeating attribute groups and creation of a new table to hold the repeating data. Repeating groups are elements that can have multiple values, such as a book that can have multiple authors. In this case a new table would be created for the repeating group which has a single row for each book/author pair. This is called “First Normal form.”
Second Normal form is obtained by finding attributes that depend on only part of a multi-field key and moving them to a separate table. For example, in a book/author table, there could be an attribute for the book’s abstract; these would be the same for all rows for the same book.
Third Normal form is obtained by eliminating columns that are not dependent on the key attributes. For example, in a book/author table, there could be an attribute that contains the author’s company name. In Third Normal form, an author/company table would be created.
Most well designed databases are at least Third Normal form. There are additional rules and normalized forms, but these are often not needed for small databases. When you either help design or review manufacturing databases, check that the database is at least Third Normal form.
Database structures and the rules for normalization are usually taught only in college-level Software Engineering or Computer Science courses, and most engineers never receive formal training. If you want to expand your skill set in the math of information, take a class at your local college or community college in database theory and structure. It will provide you with valuable IT skills.
|Dennis Brandl is president of BR&L Consulting in Cary, NC, which focuses on manufacturing IT. Reach him at email@example.com .|