33,820 research outputs found
ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning
Machine learning pipeline potentially consists of several stages of
operations like data preprocessing, feature engineering and machine learning
model training. Each operation has a set of hyper-parameters, which can become
irrelevant for the pipeline when the operation is not selected. This gives rise
to a hierarchical conditional hyper-parameter space. To optimize this mixed
continuous and discrete conditional hierarchical hyper-parameter space, we
propose an efficient pipeline search and configuration algorithm which combines
the power of Reinforcement Learning and Bayesian Optimization. Empirical
results show that our method performs favorably compared to state of the art
methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search
Learning Unmanned Aerial Vehicle Control for Autonomous Target Following
While deep reinforcement learning (RL) methods have achieved unprecedented
successes in a range of challenging problems, their applicability has been
mainly limited to simulation or game domains due to the high sample complexity
of the trial-and-error learning process. However, real-world robotic
applications often need a data-efficient learning process with safety-critical
constraints. In this paper, we consider the challenging problem of learning
unmanned aerial vehicle (UAV) control for tracking a moving target. To acquire
a strategy that combines perception and control, we represent the policy by a
convolutional neural network. We develop a hierarchical approach that combines
a model-free policy gradient method with a conventional feedback
proportional-integral-derivative (PID) controller to enable stable learning
without catastrophic failure. The neural network is trained by a combination of
supervised learning from raw images and reinforcement learning from games of
self-play. We show that the proposed approach can learn a target following
policy in a simulator efficiently and the learned behavior can be successfully
transferred to the DJI quadrotor platform for real-world UAV control
Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis
Open-ended learning benefits immensely from the use of symbolic methods for
goal representation as they offer ways to structure knowledge for efficient and
transferable learning. However, the existing Hierarchical Reinforcement
Learning (HRL) approaches relying on symbolic reasoning are often limited as
they require a manual goal representation. The challenge in autonomously
discovering a symbolic goal representation is that it must preserve critical
information, such as the environment dynamics. In this paper, we propose a
developmental mechanism for goal discovery via an emergent representation that
abstracts (i.e., groups together) sets of environment states that have similar
roles in the task. We introduce a Feudal HRL algorithm that concurrently learns
both the goal representation and a hierarchical policy. The algorithm uses
symbolic reachability analysis for neural networks to approximate the
transition relation among sets of states and to refine the goal representation.
We evaluate our approach on complex navigation tasks, showing the learned
representation is interpretable, transferrable and results in data efficient
learning
The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors
Though deep reinforcement learning has led to breakthroughs in many difficult
domains, these successes have required an ever-increasing number of samples. As
state-of-the-art reinforcement learning (RL) systems require an exponentially
increasing number of samples, their development is restricted to a continually
shrinking segment of the AI community. Likewise, many of these systems cannot
be applied to real-world problems, where environment samples are expensive.
Resolution of these limitations requires new, sample-efficient methods. To
facilitate research in this direction, we introduce the MineRL Competition on
Sample Efficient Reinforcement Learning using Human Priors.
The primary goal of the competition is to foster the development of
algorithms which can efficiently leverage human demonstrations to drastically
reduce the number of samples needed to solve complex, hierarchical, and sparse
environments. To that end, we introduce: (1) the Minecraft ObtainDiamond task,
a sequential decision making environment requiring long-term planning,
hierarchical control, and efficient exploration methods; and (2) the MineRL-v0
dataset, a large-scale collection of over 60 million state-action pairs of
human demonstrations that can be resimulated into embodied trajectories with
arbitrary modifications to game state and visuals.
Participants will compete to develop systems which solve the ObtainDiamond
task with a limited number of samples from the environment simulator, Malmo.
The competition is structured into two rounds in which competitors are provided
several paired versions of the dataset and environment with different game
textures. At the end of each round, competitors will submit containerized
versions of their learning algorithms and they will then be trained/evaluated
from scratch on a hold-out dataset-environment pair for a total of 4-days on a
prespecified hardware platform.Comment: accepted at NeurIPS 2019, 28 page
From semantics to execution: Integrating action planning with reinforcement learning for robotic causal problem-solving
Reinforcement learning is an appropriate and successful method to robustly
perform low-level robot control under noisy conditions. Symbolic action
planning is useful to resolve causal dependencies and to break a causally
complex problem down into a sequence of simpler high-level actions. A problem
with the integration of both approaches is that action planning is based on
discrete high-level action- and state spaces, whereas reinforcement learning is
usually driven by a continuous reward function. However, recent advances in
reinforcement learning, specifically, universal value function approximators
and hindsight experience replay, have focused on goal-independent methods based
on sparse rewards. In this article, we build on these novel methods to
facilitate the integration of action planning with reinforcement learning by
exploiting the reward-sparsity as a bridge between the high-level and low-level
state- and control spaces. As a result, we demonstrate that the integrated
neuro-symbolic method is able to solve object manipulation problems that
involve tool use and non-trivial causal dependencies under noisy conditions,
exploiting both data and knowledge
Hierarchical Reinforcement Learning for Quadruped Locomotion
Legged locomotion is a challenging task for learning algorithms, especially
when the task requires a diverse set of primitive behaviors. To solve these
problems, we introduce a hierarchical framework to automatically decompose
complex locomotion tasks. A high-level policy issues commands in a latent space
and also selects for how long the low-level policy will execute the latent
command. Concurrently, the low-level policy uses the latent command and only
the robot's on-board sensors to control the robot's actuators. Our approach
allows the high-level policy to run at a lower frequency than the low-level
one. We test our framework on a path-following task for a dynamic quadruped
robot and we show that steering behaviors automatically emerge in the latent
command space as low-level skills are needed for this task. We then show
efficient adaptation of the trained policy to a different task by transfer of
the trained low-level policy. Finally, we validate the policies on a real
quadruped robot. To the best of our knowledge, this is the first application of
end-to-end hierarchical learning to a real robotic locomotion task
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent
years, with notable achievements such as Deepmind's AlphaGo. It has been
successfully deployed in commercial vehicles like Mobileye's path planning
system. However, a vast majority of work on DRL is focused on toy examples in
controlled synthetic car simulator environments such as TORCS and CARLA. In
general, DRL is still at its infancy in terms of usability in real-world
applications. Our goal in this paper is to encourage real-world deployment of
DRL in various autonomous driving (AD) applications. We first provide an
overview of the tasks in autonomous driving systems, reinforcement learning
algorithms and applications of DRL to AD systems. We then discuss the
challenges which must be addressed to enable further progress towards
real-world deployment.Comment: Accepted for Oral Presentation at VISAPP 201
Neural Modular Control for Embodied Question Answering
We present a modular approach for learning policies for navigation over long
planning horizons from language input. Our hierarchical policy operates at
multiple timescales, where the higher-level master policy proposes subgoals to
be executed by specialized sub-policies. Our choice of subgoals is
compositional and semantic, i.e. they can be sequentially combined in arbitrary
orderings, and assume human-interpretable descriptions (e.g. 'exit room', 'find
kitchen', 'find refrigerator', etc.).
We use imitation learning to warm-start policies at each level of the
hierarchy, dramatically increasing sample efficiency, followed by reinforcement
learning. Independent reinforcement learning at each level of hierarchy enables
sub-policies to adapt to consequences of their actions and recover from errors.
Subsequent joint hierarchical training enables the master policy to adapt to
the sub-policies.
On the challenging EQA (Das et al., 2018) benchmark in House3D (Wu et al.,
2018), requiring navigating diverse realistic indoor environments, our approach
outperforms prior work by a significant margin, both in terms of navigation and
question answering.Comment: 10 pages, 3 figures, 2 tables. Published at CoRL 2018. Webpage:
https://embodiedqa.org
Machine learning model selection with multi-objective Bayesian optimization and reinforcement learning
A machine learning system, including when used in reinforcement learning, is usually fed with only limited data, while aimed at training a model with good predictive performance that can generalize to an underlying data distribution. Within certain hypothesis classes, model selection chooses a model based on selection criteria calculated from available data, which usually serve as estimators of generalization performance of the model.
One major challenge for model selection that has drawn increasing attention is the discrepancy between the data distribution where training data is sampled from and the data distribution at deployment. The model can over-fit in the training distribution, and fail to extrapolate in unseen deployment distributions, which can greatly harm the reliability of a machine learning system. Such a distribution shift challenge can become even more pronounced in high-dimensional data types like gene expression data, functional data and image data, especially in a decentralized learning scenario. Another challenge for model selection is efficient search in the hypothesis space. Since training a machine learning model usually takes a fair amount of resources, searching for an appropriate model with favorable configurations is by inheritance an expensive process, thus calling for efficient optimization algorithms.
To tackle the challenge of distribution shift, novel resampling methods for the evaluation of robustness of neural network was proposed, as well as a domain generalization method using multi-objective bayesian optimization in decentralized learning scenario and variational inference in a domain unsupervised manner.
To tackle the expensive model search problem, combining bayesian optimization and reinforcement learning in an interleaved manner was proposed for efficient search in a hierarchical conditional configuration space. Additionally, the effectiveness of using multi-objective bayesian optimization for model search in a decentralized learning scenarios was proposed and verified.
A model selection perspective to reinforcement learning was proposed with associated contributions in tackling the problem of exploration in high dimensional state action spaces and sparse reward. Connections between statistical inference and control was summarized.
Additionally, contributions in open source software development in related machine learning sub-topics like feature selection and functional data analysis with advanced tuning method and abundant benchmarking were also made
- …