How security by design might have stopped Oldsmar HMI attack
Oldsmar's cybersecurity attack was due to a lack of defense in depth in the human-machine interface (HMI). A good security by design protocol can help avert this, but it requires a fundamental rethink.
Now that the dust has settled a bit around the cyber attack at the Oldsmar water facility in Florida, and everyone has made their sales pitch, this is a good time to highlight some of the more fundamental issues that haven’t really been discussed to date.
The really scary thing about the Florida water attack is not that there was an external access point, nor that it was configured poorly and allowed someone in. Those are the obvious control failures, and the easiest to fix as well. No, the real issue at play in the Florida water hack is the lack of defense in depth with the human-machine interface (HMI) itself, which presented some fundamental flaws. These flaws open the door for a malicious insider, or even a really tired operator, to do something they shouldn’t. This is, of course, not a shock to the veterans of the field, but it is something we cannot stop saying until it changes. Security by design should be at the forefront in these situations.
The ICS security community frequently discusses how these things are designed without security in mind, but is usually in reference to protocols or the programmable logic controller (PLC). The criticism is less often pointed at the HMI design because that’s more of a project and less of a product point.
However, there is no reason for this. There are some really good HMI designers out there who take this seriously and know how to do this right. It has been so easy to spot these oversights based on simple observations. Nothing radical needs to be done here. All that’s required is HMI design gets the attention it’s due as critical infrastructure organizations commission new control systems or upgrade existing ones.
One of the easily referenceable frameworks that addresses some of the major issues with the design of the Florida HMI that was hacked, as described in publicly available information, is the OWASP Top 10 Proactive Controls. We won’t go through all 10 of these today, since I can only call out two based on the evidence in this particular case study, but this should shed some light on how picking fundamental proactive security by design controls can save everyone some major headaches down the road.
OWASP describes this as: “Input validation is a programming technique that ensures only properly formatted data may enter a software system component.”
This is further split into two types of validity checking: syntax and semantic. Syntax does not apply in this case as the attacker input a numerical value into a numerical value. Perhaps they tried other data types and were stopped, but we simply do not one way or the other. We do know semantically there was a likely failure. Semantics is about ensuring only valid values or ranges are accepted.
On the surface, there is no apparent reason to allow a 100x normal level increase in the HMI for a caustic chemical of this type. Some well-known experts in the field were quick to point out simple physics and downstream alarming systems would have prevented this from being an actual concern, so then why allow it all? A mistake like this points to the likelihood of other poorly designed input controls where maybe we would not have been so lucky.
Also, do those same physics apply in drought conditions or when the normal thermal assumptions no longer apply? Many engineering disasters have happened when engineers assumed something could never happen. It’s pretty easy to see how poor input control in the HMI design could lead to far worse incidents. Validation not only makes things harder on an attacker, but also saves the normal operators from human performance incidents.
OWASP describes this as: “Access control (or authorization) is the process of granting or denying specific requests from a user, program or process.”
This one requires a bit more assuming, but bear with me.
To understand why I find fault with this control in this particular case, we need to first explain a bit about how control rooms typically operate, along with layers of controls. First, let’s assume this is a typical 24-hour manned operations center, so the HMI in question is probably logged in all the time. It obviously was during the actual attack. In a best-case scenario, operators log out and in at shift changes, but that is not necessarily the norm. In a lot of cases, it’s a single shared account logged into every HMI at the OS level.
Second, one can assume the HMI is set up with a couple of levels. Let’s say a default read level, where screens can be seen but not manipulated, which should be the default authorization context. Then there is an operator level, where normal levels and inputs can be entered, and an engineer level where things can not only be manipulated, but also actual changes to the screen can be made or default boundaries can be overridden.
Finally, there is often an administrator role, although the system owner may or may not know the level exists. It is often the domain of the integrator who built the system and is used only for the most extreme cases. These roles are usually also shared, especially in anything not commissioned in the last 5-7 years.
The way this normally is supposed to work is the HMI screen is left in the default read level. This allows the operator to leave the PC logged in with no screen saver and other normal corporate controls in place. This is so the operator can do their job, which is to observe the system at all times. Only when action needs to be taken should the operator then elevate themselves into the actual operator level. Remember, operators live in an event horizon measured in minutes, anything requiring a quicker response, should be handled by the control system itself or the safety systems as a last resort. This means anything an operator needs to do can withstand a quick password entry. The HMI should then return to read-only promptly after inactivity, similar to what happens with a lock screen on the corporate side.
Now, it is entirely possible the attacker waited for the system to be elevated into operator mode, which still should have not been sufficient to elevate chemical levels that high (it ideally should have required an engineer role), or maybe the operator is continuously logged in because there are just a lot of alarm conditions that need constant clearing. However, it is just as likely to assume either proper role-based authorization was not implemented in this system at all, or protective controls like the timeout back to read were not in use.
If we could go back in time and apply these two controls, while it is unnerving that something as precious as a municipal water supply system was so easily accessed, the level of panic would have been much lower if proper system design had been a fundamental component from the start. Something as simple as validating all inputs would have prevented the attacker from even being able to set the value so high. Or, if the operator panel had been designed with appropriate enforcement of access controls, requiring any authorization prior to modification of any kind let alone to dangerous levels, then this seemingly unsophisticated hacker would have been prevented from their attempted attack.
Perhaps with more in-depth knowledge and interviews, we could glean even more lessons in this particular cybersecurity failure, but even with limited data, it’s pretty easy to see how picking a straightforward and well-designed set of principles to aid a security by design program can save you a lot of headaches down the road.