636 research outputs found
Barrier-Based Test Synthesis for Safety-Critical Systems Subject to Timed Reach-Avoid Specifications
We propose an adversarial, time-varying test-synthesis procedure for
safety-critical systems without requiring specific knowledge of the underlying
controller steering the system. From a broader test and evaluation context,
determination of difficult tests of system behavior is important as these tests
would elucidate problematic system phenomena before these mistakes can engender
problematic outcomes, e.g. loss of human life in autonomous cars, costly
failures for airplane systems, etc. Our approach builds on existing,
simulation-based work in the test and evaluation literature by offering a
controller-agnostic test-synthesis procedure that provides a series of
benchmark tests with which to determine controller reliability. To achieve
this, our approach codifies the system objective as a timed reach-avoid
specification. Then, by coupling control barrier functions with this class of
specifications, we construct an instantaneous difficulty metric whose minimizer
corresponds to the most difficult test at that system state. We use this
instantaneous difficulty metric in a game-theoretic fashion, to produce an
adversarial, time-varying test-synthesis procedure that does not require
specific knowledge of the system's controller, but can still provably identify
realizable and maximally difficult tests of system behavior. Finally, we
develop this test-synthesis procedure for both continuous and discrete-time
systems and showcase our test-synthesis procedure on simulated and hardware
examples
Optimizing Carbon Storage Operations for Long-Term Safety
To combat global warming and mitigate the risks associated with climate
change, carbon capture and storage (CCS) has emerged as a crucial technology.
However, safely sequestering CO2 in geological formations for long-term storage
presents several challenges. In this study, we address these issues by modeling
the decision-making process for carbon storage operations as a partially
observable Markov decision process (POMDP). We solve the POMDP using belief
state planning to optimize injector and monitoring well locations, with the
goal of maximizing stored CO2 while maintaining safety. Empirical results in
simulation demonstrate that our approach is effective in ensuring safe
long-term carbon storage operations. We showcase the flexibility of our
approach by introducing three different monitoring strategies and examining
their impact on decision quality. Additionally, we introduce a neural network
surrogate model for the POMDP decision-making process to handle the complex
dynamics of the multi-phase flow. We also investigate the effects of different
fidelity levels of the surrogate model on decision qualities
Learning and Control of Dynamical Systems
Despite the remarkable success of machine learning in various domains in recent years, our understanding of its fundamental limitations remains incomplete. This knowledge gap poses a grand challenge when deploying machine learning methods in critical decision-making tasks, where incorrect decisions can have catastrophic consequences. To effectively utilize these learning-based methods in such contexts, it is crucial to explicitly characterize their performance. Over the years, significant research efforts have been dedicated to learning and control of dynamical systems where the underlying dynamics are unknown or only partially known a priori, and must be inferred from collected data. However, much of these classical results have focused on asymptotic guarantees, providing limited insights into the amount of data required to achieve desired control performance while satisfying operational constraints such as safety and stability, especially in the presence of statistical noise.
In this thesis, we study the statistical complexity of learning and control of unknown dynamical systems. By utilizing recent advances in statistical learning theory, high-dimensional statistics, and control theoretic tools, we aim to establish a fundamental understanding of the number of samples required to achieve desired (i) accuracy in learning the unknown dynamics, (ii) performance in the control of the underlying system, and (iii) satisfaction of the operational constraints such as safety and stability. We provide finite-sample guarantees for these objectives and propose efficient learning and control algorithms that achieve the desired performance at these statistical limits in various dynamical systems. Our investigation covers a broad range of dynamical systems, starting from fully observable linear dynamical systems to partially observable linear dynamical systems, and ultimately, nonlinear systems.
We deploy our learning and control algorithms in various adaptive control tasks in real-world control systems and demonstrate their strong empirical performance along with their learning, robustness, and stability guarantees. In particular, we implement one of our proposed methods, Fourier Adaptive Learning and Control (FALCON), on an experimental aerodynamic testbed under extreme turbulent flow dynamics in a wind tunnel. The results show that FALCON achieves state-of-the-art stabilization performance and consistently outperforms conventional and other learning-based methods by at least 37%, despite using 8 times less data. The superior performance of FALCON arises from its physically and theoretically accurate modeling of the underlying nonlinear turbulent dynamics, which yields rigorous finite-sample learning and performance guarantees. These findings underscore the importance of characterizing the statistical complexity of learning and control of unknown dynamical systems.</p
Evaluating Architectural Safeguards for Uncertain AI Black-Box Components
Although tremendous progress has been made in Artificial Intelligence (AI), it entails new challenges. The growing complexity of learning tasks requires more complex AI components, which increasingly exhibit unreliable behaviour. In this book, we present a model-driven approach to model architectural safeguards for AI components and analyse their effect on the overall system reliability
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning
(RL) aims to produce RL algorithms whose policies generalise well to novel
unseen situations at deployment time, avoiding overfitting to their training
environments. Tackling this is vital if we are to deploy reinforcement learning
algorithms in real world scenarios, where the environment will be diverse,
dynamic and unpredictable. This survey is an overview of this nascent field. We
rely on a unifying formalism and terminology for discussing different ZSG
problems, building upon previous works. We go on to categorise existing
benchmarks for ZSG, as well as current methods for tackling these problems.
Finally, we provide a critical discussion of the current state of the field,
including recommendations for future work. Among other conclusions, we argue
that taking a purely procedural content generation approach to benchmark design
is not conducive to progress in ZSG, we suggest fast online adaptation and
tackling RL-specific problems as some areas for future work on methods for ZSG,
and we recommend building benchmarks in underexplored problem settings such as
offline RL ZSG and reward-function variation
When is Agnostic Reinforcement Learning Statistically Tractable?
We study the problem of agnostic PAC reinforcement learning (RL): given a
policy class , how many rounds of interaction with an unknown MDP (with a
potentially large state and action space) are required to learn an
-suboptimal policy with respect to ? Towards that end, we
introduce a new complexity measure, called the \emph{spanning capacity}, that
depends solely on the set and is independent of the MDP dynamics. With a
generative model, we show that for any policy class , bounded spanning
capacity characterizes PAC learnability. However, for online RL, the situation
is more subtle. We show there exists a policy class with a bounded
spanning capacity that requires a superpolynomial number of samples to learn.
This reveals a surprising separation for agnostic learnability between
generative access and online access models (as well as between
deterministic/stochastic MDPs under online access). On the positive side, we
identify an additional \emph{sunflower} structure, which in conjunction with
bounded spanning capacity enables statistically efficient online RL via a new
algorithm called POPLER, which takes inspiration from classical importance
sampling methods as well as techniques for reachable-state identification and
policy evaluation in reward-free exploration.Comment: Accepted to NeurIPS 202
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an
agent has access to an offline dataset and the ability to collect experience
via real-world online interaction. The framework mitigates the challenges that
arise in both pure offline and online RL settings, allowing for the design of
simple and highly effective algorithms, in both theory and practice. We
demonstrate these advantages by adapting the classical Q learning/iteration
algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q. In
our theoretical results, we prove that the algorithm is both computationally
and statistically efficient whenever the offline dataset supports a
high-quality policy and the environment has bounded bilinear rank. Notably, we
require no assumptions on the coverage provided by the initial distribution, in
contrast with guarantees for policy gradient/iteration methods. In our
experimental results, we show that Hy-Q with neural network function
approximation outperforms state-of-the-art online, offline, and hybrid RL
baselines on challenging benchmarks, including Montezuma's Revenge.Comment: 42 pages, 6 figures. Published at ICLR 2023. Code available at
https://github.com/yudasong/Hy
- …