Creating a threat-informed defense for a facility
A threat scenario is shown to explain how an asset owner can benefit from leveraging diverse data collection using the MITRE ATT&CK for ICS Matrix.
The MITRE ATT&CK for ICS Matrix provides a common nomenclature that allows asset owners, security researchers and consultants, internal defenders and product vendors to better communicate about adversary techniques. Nothing in it is new to experienced individuals, but what is revolutionary is the taxonomy of techniques and helpful guidance like data sources for detection. It helps normalize the discussion between these different groups to aid with the automation of detection and response, and more generally, the overall risk discussion.
Using the threat scenario outlined below, we’ll demonstrate how an asset owner can benefit from leveraging diverse data collection methods to create a threat-informed defense using one of the MITRE ATT&CK for ICS techniques as an example.
A third party is onsite to update some control logic due to an ac motor drive replacement in a plant cooling process. As part of this update, the third party brings in a piece of removable media. As per policy, a quick virus scan is settled on for expediency as the outage is almost over, and the physical drive replacement took longer than anticipated. The files are mostly in a proprietary control system executable format, not understood by the AV engine, and from actual scanning as per the original manufacturer’s recommendation due to concerns around false positives causing outages. The results of the malware scan come back clean with no warning.
The technician logs in as a domain administrator on the primary engineering workstation where all the control code is kept, as is required by the control system operations manual. They then plug the USB drive in and begin to copy files from the USB drive into the primary code repository file share. Again, all directories and file extensions are exempted from normal malware scanning, and this process is exempted from any whitelisting, as it is the sole method of updating the system. This has been risk reviewed/accepted with all the proper authority as part of factory and site acceptance testing (FAT/SAT).
The technician then updates all the controllers via the system software which really is just using FTP/SMB to copy files out to other workstations and the controllers. They then start system initialization and run through all the normal startup sequences. At this point, everyone is happy. The technician is off to the airport, and plant starts up and comes online.
The plant maintenance was done a calendar quarter ahead of peak season. This gave the plant time to run and be vetted prior to being placed into a must-run condition by the load management group during critical weather periods. No issues were noted during this time.
Then, on the hottest day of the year, things go terribly wrong. The grid is already having trouble balancing generation with the ever increasing load demand. The grid supervising authority has already leveraged its capacity management under volunteer load-shedding agreements.
At the site where the plant is running, all four units are smoothly running. In fact, the lead plant operator, who’s a seasoned veteran of hot summers, is noticing that water plant is running cooler than expected given all the environmental stresses. That upgrade a few months ago must be working beautifully. Then, all four units go into emergency shut down. A complete loss of cooling is alarmed, and safety systems turn on to prevent catastrophic failure of the units.
Unfortunately, the sudden loss of generation and inertia causes a voltage collapse on an already stressed grid and results in a regional black-out. It will be days or weeks before the units can be brought back online due to repairs for the cooling system.
The third-party vendor has an undetected compromise of their network, more specifically their central code vault system has been compromised. As a major value-added reseller (VAR) of the most popular control systems for this application, they are a major target. The bad actors have been in there for months and have been going through the code vault looking for high-value customer files. They know the standard procedure is to release whole code branches when doing system updates, so many more files are copied than needed, and technicians never vet the actual files copied.
They have leveraged this to plant fake files that are really malware, but parade as legitimate code. Again, no one ever counts or checks the actual file names. The advantage this gives is that all normal checks and balances are by-passed. There is no need to worry about malware checks because the files are always exempted. There is no reason to worry about privilege escalation because the code is already a domain administrator.
There is no reason to worry about being copied around, because it is already copied out under a white-listed network process that comes from a trusted source using a trusted protocol. The AI algorithms don’t scream or are suppressed as part of the maintenance work. Again, this is normal, approved work. The malware files are tiny compared to the larger dataset, so no one will notice the couple of extra kilobytes on the transfer.
This malware goes on a timer. Since being installed in the system, it is waiting to do its thing. The first thing it does is reach out to the control system’s active directory controller (DC), which just so happens to be running on the engineering server – which isn’t monitored by IT because they’ve been kept out of these systems – and begins to query the groups. It automatically creates a couple of familiar looking groups and accounts all with highly privileged access.
It also looks at the local firewall rules on the host and determines that 53 is allowed outward access. As this is a DC, it is also a forward domain name system (DNS) look-up server, and as such has access through DNS to the internet. Again, nothing too suspect here. Since there is no firewall violation due to an unfortunate firewall rules setting, no alerts are sent to the security operations center.
Once the malware contacts the fake DNS host, it begins to leverage DNS to import new code and export critical system data. The first thing they export is an encrypted packet of all the AD groups. The bad actors leverage this to determine that there is an admin group that hasn’t changed its password or logged on in 180 days.
They reset the password on this account and use it to make a few more paths out. They also include access for the control protocol used on the network, as well as the controller administrative protocol used to load firmware for this cooling system.
The actors also discover during this time, that while each unit has its own cooling setup, they are all conveniently tied back into the same supervisory infrastructure from the same vendor. This allows the operators to manage all 4 units from a single human-machine interface (HMI), but also allows bad actors to easily replicate their attack across the entire site.
Over next couple of months, the bad actors take their time to replace executables on the engineering server with new ones, changing the operations mode of the programmable logic controllers (PLCs) to accept new code and firmware, and slowly pushing new firmware that can be used to mask actual conditions. Routine operations actually do most of the heavy lifting of pushing the new code out under normal routines, so very little high-risk activity had to be carried out by the bad actors themselves.
The critical day hits, and the bad actors are ready. They’ve been watching the heat wave affect other utility grids as it migrates towards your region. Even without any extra help, older less sophisticated equipment has been failing. The attack begins by sending a command for the controllers to start hiding temperatures and pressure readings and reducing cooling system performance. Fans on the mechanical cooling towers are told to operate at a percentage of what they should. Pumps are told to push harder and faster than they should, all of which is contributing to a slow and steady rise in the water temperature and pressure. Eventually, this pushes the weakest point on all the systems to fail in very short order.