12,545 research outputs found
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent
years, with notable achievements such as Deepmind's AlphaGo. It has been
successfully deployed in commercial vehicles like Mobileye's path planning
system. However, a vast majority of work on DRL is focused on toy examples in
controlled synthetic car simulator environments such as TORCS and CARLA. In
general, DRL is still at its infancy in terms of usability in real-world
applications. Our goal in this paper is to encourage real-world deployment of
DRL in various autonomous driving (AD) applications. We first provide an
overview of the tasks in autonomous driving systems, reinforcement learning
algorithms and applications of DRL to AD systems. We then discuss the
challenges which must be addressed to enable further progress towards
real-world deployment.Comment: Accepted for Oral Presentation at VISAPP 201
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
We propose a general and model-free approach for Reinforcement Learning (RL)
on real robotics with sparse rewards. We build upon the Deep Deterministic
Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and
actual interactions are used to fill a replay buffer and the sampling ratio
between demonstrations and transitions is automatically tuned via a prioritized
replay mechanism. Typically, carefully engineered shaping rewards are required
to enable the agents to efficiently explore on high dimensional control
problems such as robotics. They are also required for model-based acceleration
methods relying on local solvers such as iLQG (e.g. Guided Policy Search and
Normalized Advantage Function). The demonstrations replace the need for
carefully engineered rewards, and reduce the exploration problem encountered by
classical RL approaches in these domains. Demonstrations are collected by a
robot kinesthetically force-controlled by a human demonstrator. Results on four
simulated insertion tasks show that DDPG from demonstrations out-performs DDPG,
and does not require engineered rewards. Finally, we demonstrate the method on
a real robotics task consisting of inserting a clip (flexible object) into a
rigid object
Randomized Value Functions via Multiplicative Normalizing Flows
Randomized value functions offer a promising approach towards the challenge
of efficient exploration in complex environments with high dimensional state
and action spaces. Unlike traditional point estimate methods, randomized value
functions maintain a posterior distribution over action-space values. This
prevents the agent's behavior policy from prematurely exploiting early
estimates and falling into local optima. In this work, we leverage recent
advances in variational Bayesian neural networks and combine these with
traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG)
to achieve randomized value functions for high-dimensional domains. In
particular, we augment DQN and DDPG with multiplicative normalizing flows in
order to track a rich approximate posterior distribution over the parameters of
the value function. This allows the agent to perform approximate Thompson
sampling in a computationally efficient manner via stochastic gradient methods.
We demonstrate the benefits of our approach through an empirical comparison in
high dimensional environments
Deep Generative Models with Learnable Knowledge Constraints
The broad set of deep generative models (DGMs) has achieved remarkable
advances. However, it is often difficult to incorporate rich structured domain
knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a
principled framework to impose structured constraints on probabilistic models,
but has limited applicability to the diverse DGMs that can lack a Bayesian
formulation or even explicit density evaluation. PR also requires constraints
to be fully specified a priori, which is impractical or suboptimal for complex
knowledge with learnable uncertain parts. In this paper, we establish
mathematical correspondence between PR and reinforcement learning (RL), and,
based on the connection, expand PR to learn constraints as the extrinsic reward
in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is
flexible to adapt arbitrary constraints with the model jointly. Experiments on
human image generation and templated sentence generation show models with
learned knowledge constraints by our algorithm greatly improve over base
generative models.Comment: Neural Information Processing Systems (NeurIPS) 201
Deep Reinforcement Learning for Autonomous Driving
Reinforcement learning has steadily improved and outperform human in lots of
traditional games since the resurgence of deep neural network. However, these
success is not easy to be copied to autonomous driving because the state spaces
in real world are extreme complex and action spaces are continuous and fine
control is required. Moreover, the autonomous driving vehicles must also keep
functional safety under the complex environments. To deal with these
challenges, we first adopt the deep deterministic policy gradient (DDPG)
algorithm, which has the capacity to handle complex state and action spaces in
continuous domain. We then choose The Open Racing Car Simulator (TORCS) as our
environment to avoid physical damage. Meanwhile, we select a set of appropriate
sensor information from TORCS and design our own rewarder. In order to fit DDPG
algorithm to TORCS, we design our network architecture for both actor and
critic inside DDPG paradigm. To demonstrate the effectiveness of our model, We
evaluate on different modes in TORCS and show both quantitative and qualitative
results.Comment: no time for further improvemen
Parameter Sharing Reinforcement Learning Architecture for Multi Agent Driving Behaviors
Multi-agent learning provides a potential framework for learning and
simulating traffic behaviors. This paper proposes a novel architecture to learn
multiple driving behaviors in a traffic scenario. The proposed architecture can
learn multiple behaviors independently as well as simultaneously. We take
advantage of the homogeneity of agents and learn in a parameter sharing
paradigm. To further speed up the training process asynchronous updates are
employed into the architecture. While learning different behaviors
simultaneously, the given framework was also able to learn cooperation between
the agents, without any explicit communication. We applied this framework to
learn two important behaviors in driving: 1) Lane-Keeping and 2) Over-Taking.
Results indicate faster convergence and learning of a more generic behavior,
that is scalable to any number of agents. When compared the results with
existing approaches, our results indicate equal and even better performance in
some cases
InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
The goal of imitation learning is to mimic expert behavior without access to
an explicit reward signal. Expert demonstrations provided by humans, however,
often show significant variability due to latent factors that are typically not
explicitly modeled. In this paper, we propose a new algorithm that can infer
the latent structure of expert demonstrations in an unsupervised way. Our
method, built on top of Generative Adversarial Imitation Learning, can not only
imitate complex behaviors, but also learn interpretable and meaningful
representations of complex behavioral data, including visual demonstrations. In
the driving domain, we show that a model learned from human demonstrations is
able to both accurately reproduce a variety of behaviors and accurately
anticipate human actions using raw visual inputs. Compared with various
baselines, our method can better capture the latent structure underlying expert
demonstrations, often recovering semantically meaningful factors of variation
in the data.Comment: 14 pages, NIPS 201
Challenges of Real-World Reinforcement Learning
Reinforcement learning (RL) has proven its worth in a series of artificial
domains, and is beginning to show some successes in real-world scenarios.
However, much of the research advances in RL are often hard to leverage in
real-world systems due to a series of assumptions that are rarely satisfied in
practice. We present a set of nine unique challenges that must be addressed to
productionize RL to real world problems. For each of these challenges, we
specify the exact meaning of the challenge, present some approaches from the
literature, and specify some metrics for evaluating that challenge. An approach
that addresses all nine challenges would be applicable to a large number of
real world problems. We also present an example domain that has been modified
to present these challenges as a testbed for practical RL research
Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp Merge
Multiple automakers have in development or in production automated driving
systems (ADS) that offer freeway-pilot functions. This type of ADS is typically
limited to restricted-access freeways only, that is, the transition from manual
to automated modes takes place only after the ramp merging process is completed
manually. One major challenge to extend the automation to ramp merging is that
the automated vehicle needs to incorporate and optimize long-term objectives
(e.g. successful and smooth merge) when near-term actions must be safely
executed. Moreover, the merging process involves interactions with other
vehicles whose behaviors are sometimes hard to predict but may influence the
merging vehicle optimal actions. To tackle such a complicated control problem,
we propose to apply Deep Reinforcement Learning (DRL) techniques for finding an
optimal driving policy by maximizing the long-term reward in an interactive
environment. Specifically, we apply a Long Short-Term Memory (LSTM)
architecture to model the interactive environment, from which an internal state
containing historical driving information is conveyed to a Deep Q-Network
(DQN). The DQN is used to approximate the Q-function, which takes the internal
state as input and generates Q-values as output for action selection. With this
DRL architecture, the historical impact of interactive environment on the
long-term reward can be captured and taken into account for deciding the
optimal control policy. The proposed architecture has the potential to be
extended and applied to other autonomous driving scenarios such as driving
through a complex intersection or changing lanes under varying traffic flow
conditions.Comment: IEEE International Conference on Intelligent Transportation Systems,
Yokohama, Japan, 201
Bayesian policy selection using active inference
Learning to take actions based on observations is a core requirement for
artificial agents to be able to be successful and robust at their task.
Reinforcement Learning (RL) is a well-known technique for learning such
policies. However, current RL algorithms often have to deal with reward
shaping, have difficulties generalizing to other environments and are most
often sample inefficient. In this paper, we explore active inference and the
free energy principle, a normative theory from neuroscience that explains how
self-organizing biological systems operate by maintaining a model of the world
and casting action selection as an inference problem. We apply this concept to
a typical problem known to the RL community, the mountain car problem, and show
how active inference encompasses both RL and learning from demonstrations.Comment: ICLR 2019 Workshop on Structure & priors in reinforcement learnin
- …