516 research outputs found
Decision-Making Under Uncertainty: Beyond Probabilities
This position paper reflects on the state-of-the-art in decision-making under
uncertainty. A classical assumption is that probabilities can sufficiently
capture all uncertainty in a system. In this paper, the focus is on the
uncertainty that goes beyond this classical interpretation, particularly by
employing a clear distinction between aleatoric and epistemic uncertainty. The
paper features an overview of Markov decision processes (MDPs) and extensions
to account for partial observability and adversarial behavior. These models
sufficiently capture aleatoric uncertainty but fail to account for epistemic
uncertainty robustly. Consequently, we present a thorough overview of so-called
uncertainty models that exhibit uncertainty in a more robust interpretation. We
show several solution techniques for both discrete and continuous models,
ranging from formal verification, over control-based abstractions, to
reinforcement learning. As an integral part of this paper, we list and discuss
several key challenges that arise when dealing with rich types of uncertainty
in a model-based fashion
Neural Simplex Architecture
We present the Neural Simplex Architecture (NSA), a new approach to runtime
assurance that provides safety guarantees for neural controllers (obtained e.g.
using reinforcement learning) of autonomous and other complex systems without
unduly sacrificing performance. NSA is inspired by the Simplex control
architecture of Sha et al., but with some significant differences. In the
traditional approach, the advanced controller (AC) is treated as a black box;
when the decision module switches control to the baseline controller (BC), the
BC remains in control forever. There is relatively little work on switching
control back to the AC, and there are no techniques for correcting the AC's
behavior after it generates a potentially unsafe control input that causes a
failover to the BC. Our NSA addresses both of these limitations. NSA not only
provides safety assurances in the presence of a possibly unsafe neural
controller, but can also improve the safety of such a controller in an online
setting via retraining, without overly degrading its performance. To
demonstrate NSA's benefits, we have conducted several significant case studies
in the continuous control domain. These include a target-seeking ground rover
navigating an obstacle field, and a neural controller for an artificial
pancreas system.Comment: 12th NASA Formal Methods Symposium (NFM 2020
Robust Learning Enabled Intelligence for the Internet-of-Things: A Survey From the Perspectives of Noisy Data and Adversarial Examples
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordThe Internet-of-Things (IoT) has been widely adopted in a range of verticals, e.g., automation, health, energy and manufacturing. Many of the applications in these sectors, such as self-driving cars and remote surgery, are critical and high stakes applications, calling for advanced machine learning (ML) models for data analytics. Essentially, the training and testing data that are collected by massive IoT devices may contain noise (e.g., abnormal data, incorrect labels and incomplete information) and adversarial examples. This requires high robustness of ML models to make reliable decisions for IoT applications. The research of robust ML has received tremendous attentions from both academia and industry in recent years. This paper will investigate the state-of-the-art and representative works of robust ML models that can enable high resilience and reliability of IoT intelligence. Two aspects of robustness will be focused on, i.e., when the training data of ML models contains noises and adversarial examples, which may typically happen in many real-world IoT scenarios. In addition, the reliability of both neural networks and reinforcement learning framework will be investigated. Both of these two machine learning paradigms have been widely used in handling data in IoT scenarios. The potential research challenges and open issues will be discussed to provide future research directions.Engineering and Physical Sciences Research Council (EPSRC
Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation
Autonomous mobile robots employed in industrial applications often operate in complex and uncertain environments. In this paper we propose an approach based on an extension of Partially Observable Monte Carlo Planning (POMCP) for robot velocity regulation in industrial-like environments characterized by uncertain motion difficulties. The velocity selected by POMCP is used by a standard engine controller which deals with path planning. This two-layer approach allows POMCP to exploit prior knowledge on the relationships between task similarities to improve performance in terms of time spent to traverse a path with obstacles. We also propose three measures to support human-understanding of the strategy used by POMCP to improve the performance. The overall architecture is tested on a Turtlebot3 in two environments, a rectangular path and a realistic production line in a research lab. Tests performed on a C++ simulator confirm the capability of the proposed approach to profitably use prior knowledge, achieving a performance improvement from 0.7% to 3.1% depending on the complexity of the path. Experiments on a Unity simulator show that the proposed two-layer approach outperforms also single-layer approaches based only on the engine controller (i.e., without the POMCP layer). In this case the performance improvement is up to 37% comparing to a state-of-the-art deep reinforcement learning engine controller, and up to 51% comparing to the standard ROS engine controller. Finally, experiments in a real-world testing arena confirm the possibility to run the approach on real robots
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Offline multi-agent reinforcement learning is challenging due to the coupling
effect of both distribution shift issue common in offline setting and the high
dimension issue common in multi-agent setting, making the action
out-of-distribution (OOD) and value overestimation phenomenon excessively
severe. Tomitigate this problem, we propose a novel multi-agent offline RL
algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct
conservative value estimation. Rather than regarding all the agents as a high
dimensional single one and directly applying single agent methods to it, CFCQL
calculates conservative regularization for each agent separately in a
counterfactual way and then linearly combines them to realize an overall
conservative value estimation. We prove that it still enjoys the
underestimation property and the performance guarantee as those single agent
conservative methods do, but the induced regularization and safe policy
improvement bound are independent of the agent number, which is therefore
theoretically superior to the direct treatment referred to above, especially
when the agent number is large. We further conduct experiments on four
environments including both discrete and continuous action settings on both
existing and our man-made datasets, demonstrating that CFCQL outperforms
existing methods on most datasets and even with a remarkable margin on some of
them.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS
2023
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Transformer, originally devised for natural language processing, has also
attested significant success in computer vision. Thanks to its super expressive
power, researchers are investigating ways to deploy transformers to
reinforcement learning (RL) and the transformer-based models have manifested
their potential in representative RL benchmarks. In this paper, we collect and
dissect recent advances on transforming RL by transformer (transformer-based RL
or TRL), in order to explore its development trajectory and future trend. We
group existing developments in two categories: architecture enhancement and
trajectory optimization, and examine the main applications of TRL in robotic
manipulation, text-based games, navigation and autonomous driving. For
architecture enhancement, these methods consider how to apply the powerful
transformer structure to RL problems under the traditional RL framework, which
model agents and environments much more precisely than deep RL methods, but
they are still limited by the inherent defects of traditional RL algorithms,
such as bootstrapping and "deadly triad". For trajectory optimization, these
methods treat RL problems as sequence modeling and train a joint state-action
model over entire trajectories under the behavior cloning framework, which are
able to extract policies from static datasets and fully use the long-sequence
modeling capability of the transformer. Given these advancements, extensions
and challenges in TRL are reviewed and proposals about future direction are
discussed. We hope that this survey can provide a detailed introduction to TRL
and motivate future research in this rapidly developing field.Comment: 26 page
- …