25 research outputs found
Formal Synthesis of Controllers for Safety-Critical Autonomous Systems: Developments and Challenges
In recent years, formal methods have been extensively used in the design of
autonomous systems. By employing mathematically rigorous techniques, formal
methods can provide fully automated reasoning processes with provable safety
guarantees for complex dynamic systems with intricate interactions between
continuous dynamics and discrete logics. This paper provides a comprehensive
review of formal controller synthesis techniques for safety-critical autonomous
systems. Specifically, we categorize the formal control synthesis problem based
on diverse system models, encompassing deterministic, non-deterministic, and
stochastic, and various formal safety-critical specifications involving logic,
real-time, and real-valued domains. The review covers fundamental formal
control synthesis techniques, including abstraction-based approaches and
abstraction-free methods. We explore the integration of data-driven synthesis
approaches in formal control synthesis. Furthermore, we review formal
techniques tailored for multi-agent systems (MAS), with a specific focus on
various approaches to address the scalability challenges in large-scale
systems. Finally, we discuss some recent trends and highlight research
challenges in this area
Context-aware Status Updating: Wireless Scheduling for Maximizing Situational Awareness in Safety-critical Systems
In this study, we investigate a context-aware status updating system
consisting of multiple sensor-estimator pairs. A centralized monitor pulls
status updates from multiple sensors that are monitoring several
safety-critical situations (e.g., carbon monoxide density in forest fire
detection, machine safety in industrial automation, and road safety). Based on
the received sensor updates, multiple estimators determine the current
safety-critical situations. Due to transmission errors and limited
communication resources, the sensor updates may not be timely, resulting in the
possibility of misunderstanding the current situation. In particular, if a
dangerous situation is misinterpreted as safe, the safety risk is high. In this
paper, we introduce a novel framework that quantifies the penalty due to the
unawareness of a potentially dangerous situation. This situation-unaware
penalty function depends on two key factors: the Age of Information (AoI) and
the observed signal value. For optimal estimators, we provide an
information-theoretic bound of the penalty function that evaluates the
fundamental performance limit of the system. To minimize the penalty, we study
a pull-based multi-sensor, multi-channel transmission scheduling problem. Our
analysis reveals that for optimal estimators, it is always beneficial to keep
the channels busy. Due to communication resource constraints, the scheduling
problem can be modelled as a Restless Multi-armed Bandit (RMAB) problem. By
utilizing relaxation and Lagrangian decomposition of the RMAB, we provide a
low-complexity scheduling algorithm which is asymptotically optimal. Our
results hold for both reliable and unreliable channels. Numerical evidence
shows that our scheduling policy can achieve up to 100 times performance gain
over periodic updating and up to 10 times over randomized policy.Comment: 7 pages, 4 figures, part of this manuscript has been accepted by IEEE
MILCOM 2023 Workshop on QuAVo
Towards Safe Artificial General Intelligence
The field of artificial intelligence has recently experienced a
number of breakthroughs thanks to progress in deep learning and
reinforcement learning. Computer algorithms now outperform humans
at Go, Jeopardy, image classification, and lip reading, and are
becoming very competent at driving cars and interpreting natural
language. The rapid development has led many to conjecture that
artificial intelligence with greater-than-human ability on a wide
range of tasks may not be far. This in turn raises concerns
whether we know how to control such systems, in case we were to
successfully build them.
Indeed, if humanity would find itself in conflict with a system
of much greater intelligence than itself, then human society
would likely lose. One way to make sure we avoid such a conflict
is to ensure that any future AI system with potentially
greater-than-human-intelligence has goals that are aligned with
the goals of the rest of humanity. For example, it should not
wish to kill humans or steal their resources.
The main focus of this thesis will therefore be goal alignment,
i.e. how to design artificially intelligent agents with goals
coinciding with the goals of their designers. Focus will mainly
be directed towards variants of reinforcement learning, as
reinforcement learning currently seems to be the most promising
path towards powerful artificial intelligence. We identify and
categorize goal misalignment problems in reinforcement learning
agents as designed today, and give examples of how these agents
may cause catastrophes in the future. We also suggest a number of
reasonably modest modifications that can be used to avoid or
mitigate each identified misalignment problem. Finally, we also
study various choices of decision algorithms, and conditions for
when a powerful reinforcement learning system will permit us to
shut it down.
The central conclusion is that while reinforcement learning
systems as designed today are inherently unsafe to scale to human
levels of intelligence, there are ways to potentially address
many of these issues without straying too far from the currently
so successful reinforcement learning paradigm. Much work remains
in turning the high-level proposals suggested in this thesis into
practical algorithms, however
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data
We propose a new model-based offline RL framework, called Adversarial Models
for Offline Reinforcement Learning (ARMOR), which can robustly learn policies
to improve upon an arbitrary baseline policy regardless of data coverage. Based
on the concept of relative pessimism, ARMOR is designed to optimize for the
worst-case relative performance when facing uncertainty. In theory, we prove
that the learned policy of ARMOR never degrades the performance of the baseline
policy with any admissible hyperparameter, and can learn to compete with the
best policy within data coverage when the hyperparameter is well tuned, and the
baseline policy is supported by the data. Such a robust policy improvement
property makes ARMOR especially suitable for building real-world learning
systems, because in practice ensuring no performance degradation is imperative
before considering any benefit learning can bring
Recommended from our members
Reliable Decision-Making with Imprecise Models
The rapid growth in the deployment of autonomous systems across various sectors has generated considerable interest in how these systems can operate reliably in large, stochastic, and unstructured environments. Despite recent advances in artificial intelligence and machine learning, it is challenging to assure that autonomous systems will operate reliably in the open world. One of the causes of unreliable behavior is the impreciseness of the model used for decision-making. Due to the practical challenges in data collection and precise model specification, autonomous systems often operate based on models that do not represent all the details in the environment. Even if the system has access to a comprehensive decision-making model that accounts for all the details in the environment and all possible scenarios the agent may encounter, it may be intractable to solve this complex model optimally. Consequently, this complex, high fidelity model may be simplified to accelerate planning, introducing imprecision. Reasoning with such imprecise models affects the reliability of autonomous systems. A system\u27s actions may sometimes produce unexpected, undesirable consequences, which are often identified after deployment. How can we design autonomous systems that can operate reliably in the presence of uncertainty and model imprecision?
This dissertation presents solutions to address three classes of model imprecision in a Markov decision process, along with an analysis of the conditions under which bounded-performance can be guaranteed. First, an adaptive outcome selection approach is introduced to devise risk-aware reduced models of the environment that efficiently balance the trade-off between model simplicity and fidelity, to accelerate planning in resource-constrained settings. Second, a framework that extends stochastic shortest path framework to problems with imperfect information about the goal state during planning is introduced, along with two solution approaches to solve this problem. Finally, two complementary solution approaches are presented to minimize the negative side effects of agent actions. The techniques presented in this dissertation enable an autonomous system to detect and mitigate undesirable behavior, without redesigning the model entirely