Recovery-Guaranteed Sensor Attack Detection for Cyber-Physical Systems

Course: EE/CSC 7700 — ML for CPS
Instructor: Dr. Xugui Zhou
Presented by: Kouhyar Sheida & Amirhossein Ghorbansarvi
Summarized by: Peyton Andras & Chisom Osigwe

Brief Summary

This presentation proposes a recovery-guaranteed sensor attack detection framework for cyber‑physical systems (CPS). Unlike prior work that treats detection and recovery separately, the method co-designs detection thresholds with online recoverability verification, ensuring alarms are raised only when there remains enough time (a window of K steps) to safely recover. The architecture integrates residual calculation, state authentication with bounded error, incremental reachability-based recoverability estimation, and dynamic threshold adjustment that tightens or loosens sensitivity to guarantee recovery while minimizing false alarms. Validation spans vehicle platoon, aircraft pitch, and lane-keeping simulators, plus a physical 4‑wheel testbed, demonstrating zero missed recovery windows across bias, delay, and replay attacks.

Contents

Slide Descriptions & Screenshots

1) State of The Art

Slide 1 — Title and presenters
Problem: sensor attacks in CPS can cause unsafe control actions.

Motivates CPS security: autonomy, industrial automation, smart grids, aerospace; sensor spoofing can crash drones or mislead vehicles. Current approaches treat recovery and detection as separate problems, leading to scenarios where alarms arrive too late for safe intervention or too early causing unnecessary disruptions. The slide introduces the critical challenge of ensuring both timely detection and guaranteed recovery in safety-critical systems where delayed responses can lead to catastrophic failures.

2) Motivation

Slide 2 — State of the art
Detection and recovery traditionally handled separately.

Calls for a unified framework that coordinates detection decisions with recovery capabilities in real time. This slide highlights the fundamental limitation of existing approaches where detection systems raise alarms without considering whether sufficient time remains for the control system to safely recover. The gap between detection and recovery planning creates scenarios where systems either miss critical windows for safe intervention or generate excessive false alarms that degrade operational efficiency.

3) Problem Statement — Critical Gaps

Slide 3 — Gaps in CPS security
Late alarms → no time to recover; early alarms → false positives.

Highlights timing/coordination challenge: sensitivity vs specificity trade‑off, ensuring alarms arrive with sufficient recovery margin. The core technical challenge is determining the optimal detection threshold that maximizes sensitivity to attacks while guaranteeing the system retains a minimum K-step window for safe recovery. Traditional approaches fail because they set static thresholds without real-time verification of the system's current recoverability state, leading to either missed attacks or unnecessary operational disruptions that impact system performance and user trust.

4) Threat Landscape

Slide 4 — Threats
Sensor attacks, timing constraints, functional complexity.

Defines attack classes (bias, delay, replay) and real‑time constraints that make recovery‑aware detection necessary. Bias attacks inject constant or slowly varying offsets into sensor readings, delay attacks introduce temporal misalignment between measurements and their timestamps, while replay attacks retransmit previously captured legitimate sensor data to mask current malicious activities. Each attack type has unique challenges for detection systems as they can remain stealthy by staying within normal operational bounds while gradually pushing the system toward unsafe states, requiring sophisticated detection mechanisms that can distinguish malicious deviations from benign noise.

5) Solution Overview — Recovery‑Guaranteed Detection

Slide 5 — Solution overview
Monitor state, verify recoverability, adjust threshold, guarantee recovery.

End‑to‑end loop: compute residuals; online check of recoverable window K; adapt threshold to maintain K≥required; raise timely alarms. The framework continuously monitors system state deviations through residual calculations, simultaneously verifying whether the current state remains within a K-step recoverable region using reachability analysis. Based on this real-time recoverability assessment, detection thresholds are dynamically adjusted. This tightens when recovery margins are healthy and loosening when the system approaches critical boundaries. This ensures alarms always provide sufficient lead time for safe intervention while minimizing false positives during normal operation.

6) Framework Architecture

Slide 6 — Architecture block diagram
Residual calculator, error estimator, authenticator, recoverability calculator, threshold adjuster.

Each module collaborates to balance false alarms against recovery feasibility under real‑time constraints. The residual calculator compares predicted and measured states, the error estimator bounds uncertainty in state estimation, and the authenticator verifies whether observations fall within acceptable bounds given the error model. The recoverability calculator performs incremental reachability analysis to determine the K-step recovery window, while the threshold adjuster optimizes detection sensitivity subject to maintaining minimum recovery guarantees. This creates a closed-loop system where detection and recovery planning are inseparable and mutually reinforcing components.

7) Main Contributions

Slide 7 — Residuals and thresholds
Authentication, incremental verification, threshold adjustment.

8) Recoverability Calculator (Core)

Slide 8 — Contributions
Reachability‑based K‑step window with online, incremental checks.

Optimization objective: maximize threshold while keeping recoverability window K above a required minimum. The calculator uses set-based reachability techniques to compute forward reachable sets from the current estimated state under all possible control inputs, determining the maximum number of time steps K before the system would exit the safe operating region. By leveraging incremental computation methods and exploiting the structure of linear time-invariant systems, this module achieves real-time performance suitable for closed-loop operation in resource-constrained embedded platforms commonly found in cyber-physical systems.

9) Experimental Validation

Slide 9 — Recoverability calculator
3 simulators x 9 attack types → 0 missed recovery windows.

Vehicle platoon, aircraft pitch control, lane keeping; bias/delay/replay attacks; embedded real‑time testbed demonstration. The validation spans diverse cyber-physical domains with different dynamics and safety requirements, testing the framework against three attack categories (bias, delay, replay) with varying magnitudes and timing profiles for a total of 27 attack scenarios across the different platforms. Critically, across all experiments, the system achieved zero instances of missed recovery windows, meaning every detected attack was flagged with sufficient lead time for safe intervention, while a physical four-wheel robot testbed confirmed real-world feasibility with acceptable computational overhead on embedded hardware.

10) Impact & Future Directions

Slide 10 — Experiments
Safety guarantees, real‑world validation, broad CPS applicability.

Positions framework for life‑critical CPS and motivates probabilistic and distributed extensions. The work establishes a new paradigm for secure CPS design where detection and recovery are co-designed rather than treated as separate concerns, with immediate applicability to autonomous vehicles, industrial control systems, and aerospace platforms where safety is paramount. Future research directions include extending the deterministic framework to handle stochastic disturbances and uncertainties, developing distributed versions for large-scale networked systems, and investigating learning-based approaches to accelerate online recoverability computations for complex nonlinear dynamics.

11) Limitations

Slide 11 — Impact
LTI assumptions; no attack isolation; deterministic set‑based reasoning.

12) λ‑Detectability (Probabilistic Automata)

Slide 12 — Limitations
Quantifies probability of detection within a horizon.

Models uncertainty explicitly to enable risk‑aware detection decisions in supervisory control. Lambda-detectability extends classical detectability notions to probabilistic automata by quantifying the probability that an observer can determine the system's true state within a specified time horizon, accounting for stochastic transitions and observation uncertainties. This probabilistic framework enables designers to make quantitative trade-offs between detection confidence and response time, supporting applications where deterministic guarantees are infeasible due to inherent system randomness or incomplete information about system behavior.

13) Control Barrier Functions (CBFs) for Safe Control

Slide 13 — Control Barrier Functions
Real‑time safety via invariance constraints; avoids heavy reachability.

Convex optimization enforces safety despite partial sensor corruption with reduced compute load. Control Barrier Functions provide safety where set invariance is maintained through pointwise constraints on control inputs, formulated as quadratic programs that can be solved in real-time even on embedded platforms. Unlike reachability-based methods that compute entire forward-reachable sets, CBFs evaluate safety conditions locally at each state, offering computational efficiency while still providing formal safety guarantees.

14) Detecting Zero‑Dynamics Attacks

Slide 14 — Detecting Zero-Dynamics Attacks
Event‑triggered communication; detection & isolation of stealthy attacks.

Zero-dynamics attacks are particularly insidious because they manipulate sensor readings in directions that align with the system's unobservable modes, making them invisible to traditional residual-based detectors that rely on state estimation errors. By combining geometric analysis of the system's observability structure with event-triggered communication strategies, this approach enables detection and isolation of such stealthy attacks while simultaneously reducing network bandwidth requirements, addressing both security and resource constraints in networked cyber-physical systems.

15) Cross‑Layer Cyber–Physical Security

Slide 15 — Cross-layer security
Network‑level + physical‑level detection with correlation.

Combines protocol anomaly detection with physical signal anomalies to catch coordinated threats. Cross-layer security architectures recognize that cyber-attacks often leave traces at multiple system layers from network packet patterns to physical process deviations and that correlation across these layers can improve detection accuracy while reducing false alarms. By fusing information from network intrusion detection systems with physical-layer residual monitoring, cross-layer approaches can identify sophisticated attacks that appear benign when viewed from a single layer, such as man-in-the-middle attacks that inject physically plausible but malicious commands into control loops.

16) Discrete Event Systems under Actuator Attacks

Slide 16 — DES actuator attacks
Supervisor synthesis to restore safe operation under re‑enabled events.

Maintains correctness via event‑sequence reasoning and robust supervisor design. In discrete event systems such as manufacturing lines, transportation networks, and building automation, actuator attacks can enable forbidden event sequences that violate safety specifications or deadlock the system. Supervisor synthesis techniques from formal methods provide a principled approach to designing controllers that enforce safety and liveness properties even when attackers can arbitrarily enable or disable certain events, ensuring the system remains within its specification despite malicious interference by constructing maximally permissive controllers that guarantee correctness.

17) Dual Detection: Faults vs Integrity Attacks

Slide 17 — Dual detection framework
Controller‑side fault detector + plant‑side attack detector.

Jointly distinguishes benign faults from malicious attacks using closed‑loop signatures. Dual detection frameworks address the fundamental challenge that sensor deviations can arise from either benign hardware faults or malicious integrity attacks, requiring different mitigation strategies. By deploying detectors at both the controller side (monitoring expected closed-loop behavior) and plant side (monitoring physical consistency), and analyzing the correlation patterns between their outputs, these systems can differentiate between fault scenarios that require maintenance and attack scenarios that demand security responses, reducing both false positives from treating faults as attacks and false negatives from dismissing attacks as faults.

18) Proposed Improvements & Future Work

Slide 18 — Proposed improvements
Stochastic guarantees, risk metrics (CVaR/DRO), Bayesian/H∞ filters, K‑surrogates + CBFs.

19) Q&A / Discussion Topics

Slide 19 — Future work
Questions.

Open questions: localize attacked sensors; learn fast K predictors for complex CPS. Key discussion points include how to incorporate probabilistic guarantees into the framework for systems with significant disturbances, whether attack localization techniques can identify specific compromised sensors to enable more targeted recovery actions, and how machine learning can accelerate online recoverability computations. Additional topics include scalability to large-scale nonlinear distributed systems.

20) Discussion Topics

Slide 20 — Q&A
Probabilistic detection, attack localization, learning‑based recoverability.

Summary of Discussion Ideas

Overall, the discussion highlighted the first two bullet points mainly. The discussion included the potential for integrating learning-based approaches combining probabilistic detection and attack localization techniques to determine which sensor or subsystem is compromised and how to recover from the attack effectively.