25 research outputs found

    Formal Synthesis of Controllers for Safety-Critical Autonomous Systems: Developments and Challenges

    Full text link
    In recent years, formal methods have been extensively used in the design of autonomous systems. By employing mathematically rigorous techniques, formal methods can provide fully automated reasoning processes with provable safety guarantees for complex dynamic systems with intricate interactions between continuous dynamics and discrete logics. This paper provides a comprehensive review of formal controller synthesis techniques for safety-critical autonomous systems. Specifically, we categorize the formal control synthesis problem based on diverse system models, encompassing deterministic, non-deterministic, and stochastic, and various formal safety-critical specifications involving logic, real-time, and real-valued domains. The review covers fundamental formal control synthesis techniques, including abstraction-based approaches and abstraction-free methods. We explore the integration of data-driven synthesis approaches in formal control synthesis. Furthermore, we review formal techniques tailored for multi-agent systems (MAS), with a specific focus on various approaches to address the scalability challenges in large-scale systems. Finally, we discuss some recent trends and highlight research challenges in this area

    Context-aware Status Updating: Wireless Scheduling for Maximizing Situational Awareness in Safety-critical Systems

    Full text link
    In this study, we investigate a context-aware status updating system consisting of multiple sensor-estimator pairs. A centralized monitor pulls status updates from multiple sensors that are monitoring several safety-critical situations (e.g., carbon monoxide density in forest fire detection, machine safety in industrial automation, and road safety). Based on the received sensor updates, multiple estimators determine the current safety-critical situations. Due to transmission errors and limited communication resources, the sensor updates may not be timely, resulting in the possibility of misunderstanding the current situation. In particular, if a dangerous situation is misinterpreted as safe, the safety risk is high. In this paper, we introduce a novel framework that quantifies the penalty due to the unawareness of a potentially dangerous situation. This situation-unaware penalty function depends on two key factors: the Age of Information (AoI) and the observed signal value. For optimal estimators, we provide an information-theoretic bound of the penalty function that evaluates the fundamental performance limit of the system. To minimize the penalty, we study a pull-based multi-sensor, multi-channel transmission scheduling problem. Our analysis reveals that for optimal estimators, it is always beneficial to keep the channels busy. Due to communication resource constraints, the scheduling problem can be modelled as a Restless Multi-armed Bandit (RMAB) problem. By utilizing relaxation and Lagrangian decomposition of the RMAB, we provide a low-complexity scheduling algorithm which is asymptotically optimal. Our results hold for both reliable and unreliable channels. Numerical evidence shows that our scheduling policy can achieve up to 100 times performance gain over periodic updating and up to 10 times over randomized policy.Comment: 7 pages, 4 figures, part of this manuscript has been accepted by IEEE MILCOM 2023 Workshop on QuAVo

    Towards Safe Artificial General Intelligence

    Get PDF
    The field of artificial intelligence has recently experienced a number of breakthroughs thanks to progress in deep learning and reinforcement learning. Computer algorithms now outperform humans at Go, Jeopardy, image classification, and lip reading, and are becoming very competent at driving cars and interpreting natural language. The rapid development has led many to conjecture that artificial intelligence with greater-than-human ability on a wide range of tasks may not be far. This in turn raises concerns whether we know how to control such systems, in case we were to successfully build them. Indeed, if humanity would find itself in conflict with a system of much greater intelligence than itself, then human society would likely lose. One way to make sure we avoid such a conflict is to ensure that any future AI system with potentially greater-than-human-intelligence has goals that are aligned with the goals of the rest of humanity. For example, it should not wish to kill humans or steal their resources. The main focus of this thesis will therefore be goal alignment, i.e. how to design artificially intelligent agents with goals coinciding with the goals of their designers. Focus will mainly be directed towards variants of reinforcement learning, as reinforcement learning currently seems to be the most promising path towards powerful artificial intelligence. We identify and categorize goal misalignment problems in reinforcement learning agents as designed today, and give examples of how these agents may cause catastrophes in the future. We also suggest a number of reasonably modest modifications that can be used to avoid or mitigate each identified misalignment problem. Finally, we also study various choices of decision algorithms, and conditions for when a powerful reinforcement learning system will permit us to shut it down. The central conclusion is that while reinforcement learning systems as designed today are inherently unsafe to scale to human levels of intelligence, there are ways to potentially address many of these issues without straying too far from the currently so successful reinforcement learning paradigm. Much work remains in turning the high-level proposals suggested in this thesis into practical algorithms, however

    ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

    Full text link
    We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage. Based on the concept of relative pessimism, ARMOR is designed to optimize for the worst-case relative performance when facing uncertainty. In theory, we prove that the learned policy of ARMOR never degrades the performance of the baseline policy with any admissible hyperparameter, and can learn to compete with the best policy within data coverage when the hyperparameter is well tuned, and the baseline policy is supported by the data. Such a robust policy improvement property makes ARMOR especially suitable for building real-world learning systems, because in practice ensuring no performance degradation is imperative before considering any benefit learning can bring