Adapting XDR for OT cybersecurity

These five adaptations of traditional IT XDR allow IT security teams to achieve effective and efficient multi-telemetry detection and response in OT environments.

By John Livingston April 26, 2022
Courtesy: Brett Sayles

Chief information security officers (CISOs) and directors of cybersecurity at industrial organizations continue to be frustrated at the challenge of applying core information technology (IT) security principles to operational technology (OT) environments. This runs the gamut from gathering accurate assets — hardware as well as comprehensive software (OS, firmware, application) — inventory, patch management, configuration management, user and account control, host intrusion detection, rapid incident response, centralized and accurate reporting on vulnerabilities and risk status across all production locations and original equipment manufacturer (OEM) systems, etc.

These frustrations are growing as the risks and security requirements continue to increase. Boards of directors, insurers and regulators are requiring much greater protection and proof of security maturity improvement in OT. This is in response to the ever-growing realities of cyber risks to these environments.

The primary reason for the CISO’s frustration is the continued reliance on network traffic data and protections in OT. Due to fear and uncertainty of operators (in part encouraged by the OEMs that supply the industrial controls hardware), security teams are managing OT security with one arm behind their backs.

A network-based approach does not capture robust information on endpoint risk nor does it allow action to demonstrate improved security. Though OT security is not as mature as IT, OT still has the opportunity to be more precise and proactive in the security of OT systems (OT Systems Management) by protecting endpoints themselves. Given the increasing connectivity of OT environments, perimeter detection is an inadequate way to achieve an accurate sense of security.

In a series of blog posts, we will describe an alternative to this limited model. The first installment focuses on the benefits of a comprehensive extended detection and response (XDR) approach to OT threat detection – very similar to that in IT.

Benefits of extended detection and response (XDR)

First, we need to define what we mean by extended detection and response because the term has so many interpretations. extended detection and response (XDR) brings together data across various telemetry sources (endpoints, network, users and accounts, etc.) to deliver specific response actions that can contain or stop threats. The data included varies based on the implementation and the vendor. But the concept is to bring together data from different “silos” of the security stack to improve the accuracy of detection and the speed and precision of response. Most IT security teams are already moving in this direction whether they officially call it XDR or not.

Many others have provided detailed accounts of the benefits of such an approach – see Gartner, Forrester, Mandiant and many other security articles on this. It is not our intent in this blog to restate all of those advantages but the below briefly summarizes those benefits:

  • Broader sources of telemetry improve visibility into threats as they move across an environment – phishing moving to credential harvesting, accessing critical servers and eventually acting on endpoints.
  • Integrated analysis increases the accuracy of threat detection by bringing network, endpoint, user and other data together to reduce false positives as well as rank the criticality of the risk more accurately.
  • Integrated response accelerates time to appropriate defensive mechanisms to stop a threat. This includes both the speed of response action and the precision of those actions due to the greater insight about the threat and the systems its targeting to be able to execute “least disruptive response”.
  • Improved coordination of different security teams within the response timeline for improved reporting and efficiency.

Specifically in OT, XDR (also referred to as “IDR” – integrated detection and response) has significant advantages versus the current narrow approaches.

  • It enables security teams to move beyond today’s network traffic limitations to see threats as they emerge at the endpoint, user, software as well as network level. As Dale Peterson wrote after the MITRE ATT&CK reviews of different OT network anomaly platforms:
    • Having access to and considering the Windows logs provides a lot of useful detection information. …
    • The scenario didn’t envision any of the solutions using OT firewall logs, which would have been another great data source for detecting activity in the early phases of the attack. Endpoint detection logs and other data sources with higher fidelity than monitoring a span port were not part of the evaluation. This is in line with the industry trend to leapfrog over simple detection to a more complex and time-intensive detection solution. 
    • None of the products actively queried the PLC…. Having access to PLC/controller information can be very helpful to engineers trying to determine the goal of an attack.
  • OT usually relies on a varied set of OEM-deployed AV solutions, backup tools, etc. This hodgepodge creates significant gaps in insights of threats as they move across OEM systems or across different production facilities. XDR integrates information from all of these underlying “approved” AV tools to create a single risk view.
  • Access to detailed asset status information is critical to “least disruptive response” actions. In OT, blanket, automated response is highly risky to the process. By integrating detailed endpoint status data such as: patch & vulnerability, available and even dormant user and account information, detailed configuration settings, backup status, mitigating security controls such as whitelisting or application firewalls, etc., XDR provides more precise response actions which reduces the impact of any action to the “least disruptive” one.
  • Extended detection and response also enables actions to respond to threats. In OT, actions can be highly risky. As discussed, we need to take “least disruptive response” actions. But we do need to take action. An OT XDR platform would allow for actions taken within an OT-safe approach of review with process engineers. This is a challenging step, but one that can significantly improve the security and resilience of the industrial control system.

Recognizing OT challenges

XDR has strong appeal, but OT is different. The following three are only examples of the challenges facing XDR in OT.

  1. The cyber systems are controlling critical physical processes which means that any response action can cause significant risk to production if not appropriate.
  2. Similarly, the wrong application of security tooling to sensitive, old and often low bandwidth and low memory devices and networks can cause more operational risk than the security events they are intended to protect.
  3. Many/most of the devices in OT environments do not run traditional operating systems that IT security tools were designed for. Therefore, any application needs to address these legacy embedded firmware devices.

Five components for adapting XDR for OT

Here are 5 key components that make XDR a practical reality in OT:

  1. OT-specific endpoint visibility: It starts with a different approach to gathering endpoint data than relying only on network traffic. This involves adapting traditional IT agent-agentless mechanisms so they are safe and effective in OT. An OT-specific, proven-safe, vendor-agnostic agent and agentless architecture that gathers deep visibility into each endpoint. This combination gathers hundreds of pieces of data from endpoints such as all installed applications, all users & accounts and their security settings, full configuration status information, etc. This endpoint asset visibility is similar to that expected by security leaders in their IT systems – without causing any risk to OT assets.
  2. OT-specific endpoint telemetry capture: It then adapts gathering of real time information directly from these assets – logs, syslog, network flows, device and user behavior, performance statistics, etc. This is all gathered in an OT-network sensitive fashion so it operates without disruption of limited bandwidth networks.
  3. OT-system inbound integrations: Then instead of just sending data outbound to a collector like Splunk, Verve integrates inbound a wide range of third-party information available within the control systems: AV alters and logs from the various approved OEM solutions, whitelisting alerts, firewall alerts and detections, backup status data, even process control alarms. Therefore, Verve’s machine learning engines are aggregating telemetry from a much wider source pool for accurate detections.
  4. OT-specific detections: XDR is only effective if the detections tie specifically to the environment and provide recommended response actions relevant to that system. The approach leverages machine learning and anomaly detection engines to identify OT-specific threats with hundreds of pre-built detections. But most importantly, because the XDR database contains all the endpoint status information, the detections and the response actions are precise and allow for the “least disruptive response” possible given the threat and the endpoint/system itself.
  5. OT-safe response actions: XDR must include “response”. But in OT those responses need to follow proper industrial controls engineering processes. We’ve leveraged our 30 years of controls expertise to build a platform that enables the organization to “Think Global, but Act Local”. For accuracy of detection and speed of response, the platform must enable global analysis as well as “automation” of response actions. BUT, those actions should go through “local” engineers that know the specifics of the process before the automated action is initiated. The ”Act Local” adaptation of XDR accelerates response but includes critical OT safeguards.

These five adaptations of traditional IT XDR allow IT security teams to achieve effective and efficient multi-telemetry detection and response in OT environments. We believe the growing risks highlighted at the outset of this blog require OT security practitioners to move beyond the “cannots” or “will nots” so often found in OT and find OT-safe adaptations based on a deep understanding of industrial controls engineering to deliver IT-level security to these critical OT environments.

– This originally appeared on Verve Industrial’s website. Verve Industrial is a CFE Media and Technology content partner.

Original content can be found at Verve Industrial.

Author Bio: John Livingston leads Verve Industrial's mission to protect the world’s infrastructure. He brings 20+ years of experience from McKinsey & Co., advising large companies in strategy and operations. Recognizing the challenges of greater industrial connectivity, John joined Verve Industrial to help companies find the lowest cost and simplest solutions to their controls, data and ICS security challenges.