119 research outputs found

    Optimal Attack against Cyber-Physical Control Systems with Reactive Attack Mitigation

    Full text link
    This paper studies the performance and resilience of a cyber-physical control system (CPCS) with attack detection and reactive attack mitigation. It addresses the problem of deriving an optimal sequence of false data injection attacks that maximizes the state estimation error of the system. The results provide basic understanding about the limit of the attack impact. The design of the optimal attack is based on a Markov decision process (MDP) formulation, which is solved efficiently using the value iteration method. Using the proposed framework, we quantify the effect of false positives and mis-detections on the system performance, which can help the joint design of the attack detection and mitigation. To demonstrate the use of the proposed framework in a real-world CPCS, we consider the voltage control system of power grids, and run extensive simulations using PowerWorld, a high-fidelity power system simulator, to validate our analysis. The results show that by carefully designing the attack sequence using our proposed approach, the attacker can cause a large deviation of the bus voltages from the desired setpoint. Further, the results verify the optimality of the derived attack sequence and show that, to cause maximum impact, the attacker must carefully craft his attack to strike a balance between the attack magnitude and stealthiness, due to the simultaneous presence of attack detection and mitigation

    Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

    Full text link
    The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an ϵ\epsilon-optimal policy is Ω~(∣S∣∣A∣H3/ϵ2)\tilde{\Omega}\left(|S||A|H^3 / \epsilon^2\right) over worst case instances of an MDP with state space SS, action space AA, and horizon HH. We consider a class of MDPs for which the associated optimal Q∗Q^* function is low rank, where the latent features are unknown. While one would hope to achieve linear sample complexity in ∣S∣|S| and ∣A∣|A| due to the low rank structure, we show that without imposing further assumptions beyond low rank of Q∗Q^*, if one is constrained to estimate the QQ function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon HH to learn a near optimal policy. We subsequently show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LR-MCPI) and Low Rank Empirical Value Iteration (LR-EVI) achieve the desired sample complexity of O~((∣S∣+∣A∣)poly(d,H)/ϵ2)\tilde{O}\left((|S|+|A|)\mathrm{poly}(d,H)/\epsilon^2\right) for a rank dd setting, which is minimax optimal with respect to the scaling of ∣S∣,∣A∣|S|, |A|, and ϵ\epsilon. In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal low-rank structural assumptions required on the MDP with respect to the transition kernel versus the optimal action-value function
    • …
    corecore