77,731 research outputs found
Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning
Reinforcement learning has shown great promise in the training of robot
behavior due to the sequential decision making characteristics. However, the
required enormous amount of interactive and informative training data provides
the major stumbling block for progress. In this study, we focus on accelerating
reinforcement learning (RL) training and improving the performance of
multi-goal reaching tasks. Specifically, we propose a precision-based
continuous curriculum learning (PCCL) method in which the requirements are
gradually adjusted during the training process, instead of fixing the parameter
in a static schedule. To this end, we explore various continuous curriculum
strategies for controlling a training process. This approach is tested using a
Universal Robot 5e in both simulation and real-world multi-goal reach
experiments. Experimental results support the hypothesis that a static training
schedule is suboptimal, and using an appropriate decay function for curriculum
learning provides superior results in a faster way
Masked Autoencoding for Scalable and Generalizable Decision Making
We are interested in learning scalable agents for reinforcement learning that
can learn from large-scale, diverse sequential data similar to current large
vision and language models. To this end, this paper presents masked decision
prediction (MaskDP), a simple and scalable self-supervised pretraining method
for reinforcement learning (RL) and behavioral cloning (BC). In our MaskDP
approach, we employ a masked autoencoder (MAE) to state-action trajectories,
wherein we randomly mask state and action tokens and reconstruct the missing
data. By doing so, the model is required to infer masked-out states and actions
and extract information about dynamics. We find that masking different
proportions of the input sequence significantly helps with learning a better
model that generalizes well to multiple downstream tasks. In our empirical
study, we find that a MaskDP model gains the capability of zero-shot transfer
to new BC tasks, such as single and multiple goal reaching, and it can
zero-shot infer skills from a few example transitions. In addition, MaskDP
transfers well to offline RL and shows promising scaling behavior w.r.t. to
model size. It is amenable to data-efficient finetuning, achieving competitive
results with prior methods based on autoregressive pretraining
Finding safe policies in model-based active learning
Task learning in robotics is a time-consuming process, and model-based reinforcement learning algorithms have been proposed to learn with just a small amount of experiences. However, reducing the number of experiences used to learn implies that the algorithm may overlook crucial actions required to get an optimal behavior. For example, a robot may learn simple policies that have a high risk of not reaching the goal because they often fall into dead-ends.
We propose a new method that allows the robot to reason about dead-ends and their causes. Analyzing its current model and experiences, the robot will hypothesize the possible causes for the dead-end, and identify the actions that may cause it, marking them as dangerous. Afterwards, whenever a dangerous action is included into a plan which has a high risk of leading to a dead-end, the special action request teacher confirmation will be triggered by the robot to actively confirm with a teacher that the planned risky action should be executed.
This method permits learning safer policies with the addition of just a few teacher demonstration requests. Experimental validation of the approach is provided in two different scenarios: a robotic assembly task and a domain from the international planning competition. Our approach gets success ratios very close to 1 in problems where previous approaches had high probabilities of reaching dead-endsPeer ReviewedPostprint (author’s final draft
Predicting Pilot Behavior in Medium Scale Scenarios Using Game Theory and Reinforcement Learning
Effective automation is critical in achieving the capacity and safety goals of the Next Generation Air Traffic System. Unfortunately creating integration and validation tools for such automation is difficult as the interactions between automation and their human counterparts is complex and unpredictable. This validation becomes even more difficult as we integrate wide-reaching technologies that affect the behavior of different decision makers in the system such as pilots, controllers and airlines. While overt short-term behavior changes can be explicitly modeled with traditional agent modeling systems, subtle behavior changes caused by the integration of new technologies may snowball into larger problems and be very hard to detect. To overcome these obstacles, we show how integration of new technologies can be validated by learning behavior models based on goals. In this framework, human participants are not modeled explicitly. Instead, their goals are modeled and through reinforcement learning their actions are predicted. The main advantage to this approach is that modeling is done within the context of the entire system allowing for accurate modeling of all participants as they interact as a whole. In addition such an approach allows for efficient trade studies and feasibility testing on a wide range of automation scenarios. The goal of this paper is to test that such an approach is feasible. To do this we implement this approach using a simple discrete-state learning system on a scenario where 50 aircraft need to self-navigate using Automatic Dependent Surveillance-Broadcast (ADS-B) information. In this scenario, we show how the approach can be used to predict the ability of pilots to adequately balance aircraft separation and fly efficient paths. We present results with several levels of complexity and airspace congestion
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Imitation learning is an effective approach for autonomous systems to acquire
control policies when an explicit reward function is unavailable, using
supervision provided as demonstrations from an expert, typically a human
operator. However, standard imitation learning methods assume that the agent
receives examples of observation-action tuples that could be provided, for
instance, to a supervised learning algorithm. This stands in contrast to how
humans and animals imitate: we observe another person performing some behavior
and then figure out which actions will realize that behavior, compensating for
changes in viewpoint, surroundings, object positions and types, and other
factors. We term this kind of imitation learning "imitation-from-observation,"
and propose an imitation learning method based on video prediction with context
translation and deep reinforcement learning. This lifts the assumption in
imitation learning that the demonstration should consist of observations in the
same environment configuration, and enables a variety of interesting
applications, including learning robotic skills that involve tool use simply by
observing videos of human tool use. Our experimental results show the
effectiveness of our approach in learning a wide range of real-world robotic
tasks modeled after common household chores from videos of a human
demonstrator, including sweeping, ladling almonds, pushing objects as well as a
number of tasks in simulation.Comment: Accepted at ICRA 2018, Brisbane. YuXuan Liu and Abhishek Gupta had
equal contributio
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201
- …