928 research outputs found
Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving
Behavior and motion planning play an important role in automated driving.
Traditionally, behavior planners instruct local motion planners with predefined
behaviors. Due to the high scene complexity in urban environments,
unpredictable situations may occur in which behavior planners fail to match
predefined behavior templates. Recently, general-purpose planners have been
introduced, combining behavior and local motion planning. These general-purpose
planners allow behavior-aware motion planning given a single reward function.
However, two challenges arise: First, this function has to map a complex
feature space into rewards. Second, the reward function has to be manually
tuned by an expert. Manually tuning this reward function becomes a tedious
task. In this paper, we propose an approach that relies on human driving
demonstrations to automatically tune reward functions. This study offers
important insights into the driving style optimization of general-purpose
planners with maximum entropy inverse reinforcement learning. We evaluate our
approach based on the expected value difference between learned and
demonstrated policies. Furthermore, we compare the similarity of human driven
trajectories with optimal policies of our planner under learned and
expert-tuned reward functions. Our experiments show that we are able to learn
reward functions exceeding the level of manual expert tuning without prior
domain knowledge.Comment: Appeared at IROS 2019. Accepted version. Added/updated footnote,
minor correction in preliminarie
Modified DDPG car-following model with a real-world human driving experience with CARLA simulator
In the autonomous driving field, fusion of human knowledge into Deep
Reinforcement Learning (DRL) is often based on the human demonstration recorded
in a simulated environment. This limits the generalization and the feasibility
of application in real-world traffic. We propose a two-stage DRL method to
train a car-following agent, that modifies the policy by leveraging the
real-world human driving experience and achieves performance superior to the
pure DRL agent. Training a DRL agent is done within CARLA framework with Robot
Operating System (ROS). For evaluation, we designed different driving scenarios
to compare the proposed two-stage DRL car-following agent with other agents.
After extracting the "good" behavior from the human driver, the agent becomes
more efficient and reasonable, which makes this autonomous agent more suitable
for Human-Robot Interaction (HRI) traffic
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
Recent successes combine reinforcement learning algorithms and deep neural
networks, despite reinforcement learning not being widely applied to robotics
and real world scenarios. This can be attributed to the fact that current
state-of-the-art, end-to-end reinforcement learning approaches still require
thousands or millions of data samples to converge to a satisfactory policy and
are subject to catastrophic failures during training. Conversely, in real world
scenarios and after just a few data samples, humans are able to either provide
demonstrations of the task, intervene to prevent catastrophic actions, or
simply evaluate if the policy is performing correctly. This research
investigates how to integrate these human interaction modalities to the
reinforcement learning loop, increasing sample efficiency and enabling
real-time reinforcement learning in robotics and real world scenarios. This
novel theoretical foundation is called Cycle-of-Learning, a reference to how
different human interaction modalities, namely, task demonstration,
intervention, and evaluation, are cycled and combined to reinforcement learning
algorithms. Results presented in this work show that the reward signal that is
learned based upon human interaction accelerates the rate of learning of
reinforcement learning algorithms and that learning from a combination of human
demonstrations and interventions is faster and more sample efficient when
compared to traditional supervised learning algorithms. Finally,
Cycle-of-Learning develops an effective transition between policies learned
using human demonstrations and interventions to reinforcement learning. The
theoretical foundation developed by this research opens new research paths to
human-agent teaming scenarios where autonomous agents are able to learn from
human teammates and adapt to mission performance metrics in real-time and in
real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more
information, see https://vggoecks.com
Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks
Autonomous robots need to interact with unknown, unstructured and changing
environments, constantly facing novel challenges. Therefore, continuous online
adaptation for lifelong-learning and the need of sample-efficient mechanisms to
adapt to changes in the environment, the constraints, the tasks, or the robot
itself are crucial. In this work, we propose a novel framework for
probabilistic online motion planning with online adaptation based on a
bio-inspired stochastic recurrent neural network. By using learning signals
which mimic the intrinsic motivation signalcognitive dissonance in addition
with a mental replay strategy to intensify experiences, the stochastic
recurrent network can learn from few physical interactions and adapts to novel
environments in seconds. We evaluate our online planning and adaptation
framework on an anthropomorphic KUKA LWR arm. The rapid online adaptation is
shown by learning unknown workspace constraints sample-efficiently from few
physical interactions while following given way points.Comment: accepted in Neural Network
Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations
Combined with demonstrations, deep reinforcement learning can efficiently
develop policies for manipulators. However, it takes time to collect sufficient
high-quality demonstrations in practice. And human demonstrations may be
unsuitable for robots. The non-Markovian process and over-reliance on
demonstrations are further challenges. For example, we found that RL agents are
sensitive to demonstration quality in manipulation tasks and struggle to adapt
to demonstrations directly from humans. Thus it is challenging to leverage
low-quality and insufficient demonstrations to assist reinforcement learning in
training better policies, and sometimes, limited demonstrations even lead to
worse performance.
We propose a new algorithm named TD3fG (TD3 learning from a generator) to
solve these problems. It forms a smooth transition from learning from experts
to learning from experience. This innovation can help agents extract prior
knowledge while reducing the detrimental effects of the demonstrations. Our
algorithm performs well in Adroit manipulator and MuJoCo tasks with limited
demonstrations
Recommended from our members
Reinforcement Learning for Generative Art
Reinforcement learning (RL) is an efficient class of sequential decision-making algorithms that have achieved remarkable success in a broad range of applications, such as robotic manipulations, strategic games, or autonomous driving. The most well-known example of reinforcement learning is AlphaGo, a computer program that plays the board game Go and outperforms top human Go players. Unlike other two major machine learning categories, supervised learning and unsupervised learning, in which media artists are actively engaged, reinforcement learning has yet to result in many creative applications. Generative art is usually driven, in whole or in part, by autonomous systems that are derived from a set of rules. Interestingly, an RL policy can be seen as an autonomous system where the rules are learned by interacting with its environment. Regardless of its initial purpose, reinforcement learning has the potential to expand the boundary of generative art. However, a formal process of applying reinforcement learning to generative art does not yet exist and the current RL tools require an in-depth understanding of RL concepts. To bridge the gap, the first part of the dissertation introduces a conceptual framework to adapt reinforcement learning for generative art. The framework proposes a term RL-based generative art to denote a novel form of generative art of which the use of RL agents is the key element. The creative process of RL-based generative art and possible emergent behaviors are discussed in the framework. This leads to a discussion of several author's related practices on generative art, deep-learning art, and reinforcement learning. Those practices are critical for understanding the conceptual and technical details of each component in order to construct the framework. The second part introduces RL5, a JavaScript library for rapidly prototyping RL environments and training RL policies in web browsers. The library combines RL algorithms and RL environments into one framework and is fully compatible with p5.js. RL5 is developed with a particular focus on simplicity to favor (re)usability of RL algorithms and development of RL environments. Specifically, the library implemented three RL algorithms, Tabular Q-learning, REINFORCE, and DDPG, to cover all the three families of model-free RL, and nine RL environments that six of them address autonomous agents in steering behaviors, which can be used as building blocks for complex systems. Finally, the author demonstrates four different use cases of how to apply RL5 for pedagogical and creative applications
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Learning from demonstration (LfD) is a popular technique that uses expert
demonstrations to learn robot control policies. However, the difficulty in
acquiring expert-quality demonstrations limits the applicability of LfD
methods: real-world data collection is often costly, and the quality of the
demonstrations depends greatly on the demonstrator's abilities and safety
concerns. A number of works have leveraged data augmentation (DA) to
inexpensively generate additional demonstration data, but most DA works
generate augmented data in a random fashion and ultimately produce highly
suboptimal data. In this work, we propose Guided Data Augmentation (GuDA), a
human-guided DA framework that generates expert-quality augmented data. The key
insight of GuDA is that while it may be difficult to demonstrate the sequence
of actions required to produce expert data, a user can often easily identify
when an augmented trajectory segment represents task progress. Thus, the user
can impose a series of simple rules on the DA process to automatically generate
augmented samples that approximate expert behavior. To extract a policy from
GuDA, we use off-the-shelf offline reinforcement learning and behavior cloning
algorithms. We evaluate GuDA on a physical robot soccer task as well as
simulated D4RL navigation tasks, a simulated autonomous driving task, and a
simulated soccer task. Empirically, we find that GuDA enables learning from a
small set of potentially suboptimal demonstrations and substantially
outperforms a DA strategy that samples augmented data randomly
- …