203 research outputs found
Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning
Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation
Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation
Autonomous mobile robots employed in industrial applications often operate in complex and uncertain environments. In this paper we propose an approach based on an extension of Partially Observable Monte Carlo Planning (POMCP) for robot velocity regulation in industrial-like environments characterized by uncertain motion difficulties. The velocity selected by POMCP is used by a standard engine controller which deals with path planning. This two-layer approach allows POMCP to exploit prior knowledge on the relationships between task similarities to improve performance in terms of time spent to traverse a path with obstacles. We also propose three measures to support human-understanding of the strategy used by POMCP to improve the performance. The overall architecture is tested on a Turtlebot3 in two environments, a rectangular path and a realistic production line in a research lab. Tests performed on a C++ simulator confirm the capability of the proposed approach to profitably use prior knowledge, achieving a performance improvement from 0.7% to 3.1% depending on the complexity of the path. Experiments on a Unity simulator show that the proposed two-layer approach outperforms also single-layer approaches based only on the engine controller (i.e., without the POMCP layer). In this case the performance improvement is up to 37% comparing to a state-of-the-art deep reinforcement learning engine controller, and up to 51% comparing to the standard ROS engine controller. Finally, experiments in a real-world testing arena confirm the possibility to run the approach on real robots
The Real Deal: A Review of Challenges and Opportunities in Moving Reinforcement Learning-Based Traffic Signal Control Systems Towards Reality
Traffic signal control (TSC) is a high-stakes domain that is growing in
importance as traffic volume grows globally. An increasing number of works are
applying reinforcement learning (RL) to TSC; RL can draw on an abundance of
traffic data to improve signalling efficiency. However, RL-based signal
controllers have never been deployed. In this work, we provide the first review
of challenges that must be addressed before RL can be deployed for TSC. We
focus on four challenges involving (1) uncertainty in detection, (2)
reliability of communications, (3) compliance and interpretability, and (4)
heterogeneous road users. We show that the literature on RL-based TSC has made
some progress towards addressing each challenge. However, more work should take
a systems thinking approach that considers the impacts of other pipeline
components on RL.Comment: 26 pages; accepted version, with shortened version published at the
12th International Workshop on Agents in Traffic and Transportation (ATT '22)
at IJCAI 202
A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making
The field of Sequential Decision Making (SDM) provides tools for solving
Sequential Decision Processes (SDPs), where an agent must make a series of
decisions in order to complete a task or achieve a goal. Historically, two
competing SDM paradigms have view for supremacy. Automated Planning (AP)
proposes to solve SDPs by performing a reasoning process over a model of the
world, often represented symbolically. Conversely, Reinforcement Learning (RL)
proposes to learn the solution of the SDP from data, without a world model, and
represent the learned knowledge subsymbolically. In the spirit of
reconciliation, we provide a review of symbolic, subsymbolic and hybrid methods
for SDM. We cover both methods for solving SDPs (e.g., AP, RL and techniques
that learn to plan) and for learning aspects of their structure (e.g., world
models, state invariants and landmarks). To the best of our knowledge, no other
review in the field provides the same scope. As an additional contribution, we
discuss what properties an ideal method for SDM should exhibit and argue that
neurosymbolic AI is the current approach which most closely resembles this
ideal method. Finally, we outline several proposals to advance the field of SDM
via the integration of symbolic and subsymbolic AI
Uncertainty Maximization in Partially Observable Domains: A Cognitive Perspective
Faced with an ever-increasing complexity of their domains of application,
artificial learning agents are now able to scale up in their ability to process
an overwhelming amount of information coming from their interaction with an
environment. However, this process of scaling does come with a cost of encoding
and processing an increasing amount of redundant information that is not
necessarily beneficial to the learning process itself. This work exploits the
properties of the learning systems defined over partially observable domains by
selectively focusing on the specific type of information that is more likely to
express the causal interaction among the transitioning states of the
environment. Adaptive masking of the observation space based on the
criterion enabled a significant
improvement in convergence of temporal difference algorithms defined over a
partially observable Markov process
A Practical Guide to Multi-Objective Reinforcement Learning and Planning
Real-world decision-making tasks are generally complex, requiring trade-offs
between multiple, often conflicting, objectives. Despite this, the majority of
research in reinforcement learning and decision-theoretic planning either
assumes only a single objective, or that multiple objectives can be adequately
handled via a simple linear combination. Such approaches may oversimplify the
underlying problem and hence produce suboptimal results. This paper serves as a
guide to the application of multi-objective methods to difficult problems, and
is aimed at researchers who are already familiar with single-objective
reinforcement learning and planning methods who wish to adopt a multi-objective
perspective on their research, as well as practitioners who encounter
multi-objective decision problems in practice. It identifies the factors that
may influence the nature of the desired solution, and illustrates by example
how these influence the design of multi-objective decision-making systems for
complex problems
- …