2,247 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning
Recent advances in combining deep neural network architectures with
reinforcement learning techniques have shown promising potential results in
solving complex control problems with high dimensional state and action spaces.
Inspired by these successes, in this paper, we build two kinds of reinforcement
learning algorithms: deep policy-gradient and value-function based agents which
can predict the best possible traffic signal for a traffic intersection. At
each time step, these adaptive traffic light control agents receive a snapshot
of the current state of a graphical traffic simulator and produce control
signals. The policy-gradient based agent maps its observation directly to the
control signal, however the value-function based agent first estimates values
for all legal control signals. The agent then selects the optimal control
action with the highest value. Our methods show promising results in a traffic
network simulated in the SUMO traffic simulator, without suffering from
instability issues during the training process
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning in robotics are
model-based policy search algorithms, which alternate between learning a
dynamical model of the robot and optimizing a policy to maximize the expected
return given the model and its uncertainties. Among the few proposed
approaches, the recently introduced Black-DROPS algorithm exploits a black-box
optimization algorithm to achieve both high data-efficiency and good
computation times when several cores are used; nevertheless, like all
model-based policy search approaches, Black-DROPS does not scale to high
dimensional state/action spaces. In this paper, we introduce a new model
learning procedure in Black-DROPS that leverages parameterized black-box priors
to (1) scale up to high-dimensional systems, and (2) be robust to large
inaccuracies of the prior information. We demonstrate the effectiveness of our
approach with the "pendubot" swing-up task in simulation and with a physical
hexapod robot (48D state space, 18D action space) that has to walk forward as
fast as possible. The results show that our new algorithm is more
data-efficient than previous model-based policy search algorithms (with and
without priors) and that it can allow a physical 6-legged robot to learn new
gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table;
Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at
https://youtu.be/_MZYDhfWeL
Neurosymbolic Reinforcement Learning and Planning: A Survey
The area of Neurosymbolic Artificial Intelligence (Neurosymbolic AI) is
rapidly developing and has become a popular research topic, encompassing
sub-fields such as Neurosymbolic Deep Learning (Neurosymbolic DL) and
Neurosymbolic Reinforcement Learning (Neurosymbolic RL). Compared to
traditional learning methods, Neurosymbolic AI offers significant advantages by
simplifying complexity and providing transparency and explainability.
Reinforcement Learning(RL), a long-standing Artificial Intelligence(AI) concept
that mimics human behavior using rewards and punishment, is a fundamental
component of Neurosymbolic RL, a recent integration of the two fields that has
yielded promising results. The aim of this paper is to contribute to the
emerging field of Neurosymbolic RL by conducting a literature survey. Our
evaluation focuses on the three components that constitute Neurosymbolic RL:
neural, symbolic, and RL. We categorize works based on the role played by the
neural and symbolic parts in RL, into three taxonomies:Learning for Reasoning,
Reasoning for Learning and Learning-Reasoning. These categories are further
divided into sub-categories based on their applications. Furthermore, we
analyze the RL components of each research work, including the state space,
action space, policy module, and RL algorithm. Additionally, we identify
research opportunities and challenges in various applications within this
dynamic field.Comment: 16 pages, 9 figures, IEEE Transactions on Artificial Intelligenc
Safety-Critical Learning of Robot Control with Temporal Logic Specifications
Reinforcement learning (RL) is a promising approach. However, success is
limited to real-world applications, because ensuring safe exploration and
facilitating adequate exploitation is a challenge for controlling robotic
systems with unknown models and measurement uncertainties. The learning problem
becomes even more difficult for complex tasks over continuous state-action. In
this paper, we propose a learning-based robotic control framework consisting of
several aspects: (1) we leverage Linear Temporal Logic (LTL) to express complex
tasks over infinite horizons that are translated to a novel automaton
structure; (2) we detail an innovative reward scheme for LTL satisfaction with
a probabilistic guarantee. Then, by applying a reward shaping technique, we
develop a modular policy-gradient architecture exploiting the benefits of the
automaton structure to decompose overall tasks and enhance the performance of
learned controllers; (3) by incorporating Gaussian Processes (GPs) to estimate
the uncertain dynamic systems, we synthesize a model-based safe exploration
during the learning process using Exponential Control Barrier Functions (ECBFs)
that generalize systems with high-order relative degrees; (4) to further
improve the efficiency of exploration, we utilize the properties of LTL
automata and ECBFs to propose a safe guiding process. Finally, we demonstrate
the effectiveness of the framework via several robotic environments. We show an
ECBF-based modular deep RL algorithm that achieves near-perfect success rates
and safety guarding with high probability confidence during training.Comment: Under Review. arXiv admin note: text overlap with arXiv:2102.1285
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
Black-Box Data-efficient Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) 2017; Code at
http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
- …