379 research outputs found
Learning and Control of Dynamical Systems
Despite the remarkable success of machine learning in various domains in recent years, our understanding of its fundamental limitations remains incomplete. This knowledge gap poses a grand challenge when deploying machine learning methods in critical decision-making tasks, where incorrect decisions can have catastrophic consequences. To effectively utilize these learning-based methods in such contexts, it is crucial to explicitly characterize their performance. Over the years, significant research efforts have been dedicated to learning and control of dynamical systems where the underlying dynamics are unknown or only partially known a priori, and must be inferred from collected data. However, much of these classical results have focused on asymptotic guarantees, providing limited insights into the amount of data required to achieve desired control performance while satisfying operational constraints such as safety and stability, especially in the presence of statistical noise.
In this thesis, we study the statistical complexity of learning and control of unknown dynamical systems. By utilizing recent advances in statistical learning theory, high-dimensional statistics, and control theoretic tools, we aim to establish a fundamental understanding of the number of samples required to achieve desired (i) accuracy in learning the unknown dynamics, (ii) performance in the control of the underlying system, and (iii) satisfaction of the operational constraints such as safety and stability. We provide finite-sample guarantees for these objectives and propose efficient learning and control algorithms that achieve the desired performance at these statistical limits in various dynamical systems. Our investigation covers a broad range of dynamical systems, starting from fully observable linear dynamical systems to partially observable linear dynamical systems, and ultimately, nonlinear systems.
We deploy our learning and control algorithms in various adaptive control tasks in real-world control systems and demonstrate their strong empirical performance along with their learning, robustness, and stability guarantees. In particular, we implement one of our proposed methods, Fourier Adaptive Learning and Control (FALCON), on an experimental aerodynamic testbed under extreme turbulent flow dynamics in a wind tunnel. The results show that FALCON achieves state-of-the-art stabilization performance and consistently outperforms conventional and other learning-based methods by at least 37%, despite using 8 times less data. The superior performance of FALCON arises from its physically and theoretically accurate modeling of the underlying nonlinear turbulent dynamics, which yields rigorous finite-sample learning and performance guarantees. These findings underscore the importance of characterizing the statistical complexity of learning and control of unknown dynamical systems.</p
Evaluating Architectural Safeguards for Uncertain AI Black-Box Components
Although tremendous progress has been made in Artificial Intelligence (AI), it entails new challenges. The growing complexity of learning tasks requires more complex AI components, which increasingly exhibit unreliable behaviour. In this book, we present a model-driven approach to model architectural safeguards for AI components and analyse their effect on the overall system reliability
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Data-Efficient Policy Selection for Navigation in Partial Maps via Subgoal-Based Abstraction
We present a novel approach for fast and reliable policy selection for
navigation in partial maps. Leveraging the recent learning-augmented
model-based Learning over Subgoals Planning (LSP) abstraction to plan, our
robot reuses data collected during navigation to evaluate how well other
alternative policies could have performed via a procedure we call offline
alt-policy replay. Costs from offline alt-policy replay constrain policy
selection among the LSP-based policies during deployment, allowing for
improvements in convergence speed, cumulative regret and average navigation
cost. With only limited prior knowledge about the nature of unseen
environments, we achieve at least 67% and as much as 96% improvements on
cumulative regret over the baseline bandit approach in our experiments in
simulated maze and office-like environments.Comment: 8 pages, 5 figure
Investigation of risk-aware MDP and POMDP contingency management autonomy for UAS
Unmanned aircraft systems (UAS) are being increasingly adopted for various
applications. The risk UAS poses to people and property must be kept to
acceptable levels. This paper proposes risk-aware contingency management
autonomy to prevent an accident in the event of component malfunction,
specifically propulsion unit failure and/or battery degradation. The proposed
autonomy is modeled as a Markov Decision Process (MDP) whose solution is a
contingency management policy that appropriately executes emergency landing,
flight termination or continuation of planned flight actions. Motivated by the
potential for errors in fault/failure indicators, partial observability of the
MDP state space is investigated. The performance of optimal policies is
analyzed over varying observability conditions in a high-fidelity simulator.
Results indicate that both partially observable MDP (POMDP) and maximum a
posteriori MDP policies performed similarly over different state observability
criteria, given the nearly deterministic state transition model
Risk-aware shielding of Partially Observable Monte Carlo Planning policies
Partially Observable Monte Carlo Planning (POMCP) is a powerful online algorithm that can generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. However, the lack of an explicit policy representation hinders interpretability and a proper evaluation of the risks an agent may incur. In this work, we propose a methodology based on Maximum Satisfiability Modulo Theory (MAX-SMT) for analyzing POMCP policies by inspecting their traces, namely, sequences of belief- action pairs generated by the algorithm. The proposed method explores local properties of the policy to build a compact and informative summary of the policy behaviour. Moreover, we introduce a rich and formal language that a domain expert can use to describe the expected behaviour of a policy. In more detail, we present a formulation that directly computes the risk involved in taking actions by considering the high- level elements specified by the expert. The final formula can identify risky decisions taken by POMCP that violate the expert indications. We show that this identification process can be used offline (to improve the policy’s explainability and identify anomalous behaviours) or online (to shield the risky decisions of the POMCP algorithm). We present an extended evaluation of our approach on four domains: the well-known tiger and rocksample benchmarks, a problem of velocity regulation in mobile robots, and a problem of battery management in mobile robots. We test the methodology against a state-of- the-art anomaly detection algorithm to show that our approach can be used to identify anomalous behaviours in faulty POMCP. We also show, comparing the performance of shielded and unshielded POMCP, that the shielding mechanism can improve the system’s performance. We provide an open-source implementation of the proposed methodologies at https://github.com/GiuMaz/XPOMCP
Safety of autonomous vehicles: A survey on Model-based vs. AI-based approaches
The growing advancements in Autonomous Vehicles (AVs) have emphasized the
critical need to prioritize the absolute safety of AV maneuvers, especially in
dynamic and unpredictable environments or situations. This objective becomes
even more challenging due to the uniqueness of every traffic
situation/condition. To cope with all these very constrained and complex
configurations, AVs must have appropriate control architectures with reliable
and real-time Risk Assessment and Management Strategies (RAMS). These targeted
RAMS must lead to reduce drastically the navigation risks. However, the lack of
safety guarantees proves, which is one of the key challenges to be addressed,
limit drastically the ambition to introduce more broadly AVs on our roads and
restrict the use of AVs to very limited use cases. Therefore, the focus and the
ambition of this paper is to survey research on autonomous vehicles while
focusing on the important topic of safety guarantee of AVs. For this purpose,
it is proposed to review research on relevant methods and concepts defining an
overall control architecture for AVs, with an emphasis on the safety assessment
and decision-making systems composing these architectures. Moreover, it is
intended through this reviewing process to highlight researches that use either
model-based methods or AI-based approaches. This is performed while emphasizing
the strengths and weaknesses of each methodology and investigating the research
that proposes a comprehensive multi-modal design that combines model-based and
AI approaches. This paper ends with discussions on the methods used to
guarantee the safety of AVs namely: safety verification techniques and the
standardization/generalization of safety frameworks
Information-guided Planning : An Online Approach for Partially Observable Problems
This paper presents IB-POMCP, a novel algorithm for online planning under partial observability. Our approach enhances the decision-making process by using estimations of the world belief's entropy to guide a tree search process and surpass the limitations of planning in scenarios with sparse reward configurations. By performing what we denominate as an information-guided planning process, the algorithm, which incorporates a novel I-UCB function, shows significant improvements in reward and reasoning time compared to state-of-the-art baselines in several benchmark scenarios, along with theoretical convergence guarantees
Human-Centered Autonomy for UAS Target Search
Current methods of deploying robots that operate in dynamic, uncertain
environments, such as Uncrewed Aerial Systems in search \& rescue missions,
require nearly continuous human supervision for vehicle guidance and operation.
These methods do not consider high-level mission context resulting in
cumbersome manual operation or inefficient exhaustive search patterns. We
present a human-centered autonomous framework that infers geospatial mission
context through dynamic feature sets, which then guides a probabilistic target
search planner. Operators provide a set of diverse inputs, including priority
definition, spatial semantic information about ad-hoc geographical areas, and
reference waypoints, which are probabilistically fused with geographical
database information and condensed into a geospatial distribution representing
an operator's preferences over an area. An online, POMDP-based planner,
optimized for target searching, is augmented with this reward map to generate
an operator-constrained policy. Our results, simulated based on input from five
professional rescuers, display effective task mental model alignment, 18\% more
victim finds, and 15 times more efficient guidance plans then current
operational methods.Comment: Extended version to ICRA conference submission. 9 pages, 5 figure
Simplified Continuous High Dimensional Belief Space Planning with Adaptive Probabilistic Belief-dependent Constraints
Online decision making under uncertainty in partially observable domains,
also known as Belief Space Planning, is a fundamental problem in robotics and
Artificial Intelligence. Due to an abundance of plausible future unravelings,
calculating an optimal course of action inflicts an enormous computational
burden on the agent. Moreover, in many scenarios, e.g., information gathering,
it is required to introduce a belief-dependent constraint. Prompted by this
demand, in this paper, we consider a recently introduced probabilistic
belief-dependent constrained POMDP. We present a technique to adaptively accept
or discard a candidate action sequence with respect to a probabilistic
belief-dependent constraint, before expanding a complete set of future
observations samples and without any loss in accuracy. Moreover, using our
proposed framework, we contribute an adaptive method to find a maximal feasible
return (e.g., information gain) in terms of Value at Risk for the candidate
action sequence with substantial acceleration. On top of that, we introduce an
adaptive simplification technique for a probabilistically constrained setting.
Such an approach provably returns an identical-quality solution while
dramatically accelerating online decision making. Our universal framework
applies to any belief-dependent constrained continuous POMDP with parametric
beliefs, as well as nonparametric beliefs represented by particles. In the
context of an information-theoretic constraint, our presented framework
stochastically quantifies if a cumulative information gain along the planning
horizon is sufficiently significant (e.g. for, information gathering, active
SLAM). We apply our method to active SLAM, a highly challenging problem of high
dimensional Belief Space Planning. Extensive realistic simulations corroborate
the superiority of our proposed ideas
- …