20,320 research outputs found
Visual Reinforcement Learning with Imagined Goals
For an autonomous agent to fulfill a wide range of user-specified goals at
test time, it must be able to learn broadly applicable and general-purpose
skill repertoires. Furthermore, to provide the requisite level of generality,
these skills must handle raw sensory input such as images. In this paper, we
propose an algorithm that acquires such general-purpose skills by combining
unsupervised representation learning and reinforcement learning of
goal-conditioned policies. Since the particular goals that might be required at
test-time are not known in advance, the agent performs a self-supervised
"practice" phase where it imagines goals and attempts to achieve them. We learn
a visual representation with three distinct purposes: sampling goals for
self-supervised practice, providing a structured transformation of raw sensory
inputs, and computing a reward signal for goal reaching. We also propose a
retroactive goal relabeling scheme to further improve the sample-efficiency of
our method. Our off-policy algorithm is efficient enough to learn policies that
operate on raw image observations and goals for a real-world robotic system,
and substantially outperforms prior techniques.Comment: 15 pages, NeurIPS 201
Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention
Recent progress in AI and Reinforcement learning has shown great success in
solving complex problems with high dimensional state spaces. However, most of
these successes have been primarily in simulated environments where failure is
of little or no consequence. Most real-world applications, however, require
training solutions that are safe to operate as catastrophic failures are
inadmissible especially when there is human interaction involved. Currently,
Safe RL systems use human oversight during training and exploration in order to
make sure the RL agent does not go into a catastrophic state. These methods
require a large amount of human labor and it is very difficult to scale up. We
present a hybrid method for reducing the human intervention time by combining
model-based approaches and training a supervised learner to improve sample
efficiency while also ensuring safety. We evaluate these methods on various
grid-world environments using both standard and visual representations and show
that our approach achieves better performance in terms of sample efficiency,
number of catastrophic states reached as well as overall task performance
compared to traditional model-free approache
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
Skilled robotic manipulation benefits from complex synergies between
non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing
can help rearrange cluttered objects to make space for arms and fingers;
likewise, grasping can help displace objects to make pushing movements more
precise and collision-free. In this work, we demonstrate that it is possible to
discover and learn these synergies from scratch through model-free deep
reinforcement learning. Our method involves training two fully convolutional
networks that map from visual observations to actions: one infers the utility
of pushes for a dense pixel-wise sampling of end effector orientations and
locations, while the other does the same for grasping. Both networks are
trained jointly in a Q-learning framework and are entirely self-supervised by
trial and error, where rewards are provided from successful grasps. In this
way, our policy learns pushing motions that enable future grasps, while
learning grasps that can leverage past pushes. During picking experiments in
both simulation and real-world scenarios, we find that our system quickly
learns complex behaviors amid challenging cases of clutter, and achieves better
grasping success rates and picking efficiencies than baseline alternatives
after only a few hours of training. We further demonstrate that our method is
capable of generalizing to novel objects. Qualitative results (videos), code,
pre-trained models, and simulation environments are available at
http://vpg.cs.princeton.eduComment: To appear at the International Conference On Intelligent Robots and
Systems (IROS) 2018. Project webpage: http://vpg.cs.princeton.edu Summary
video: https://youtu.be/-OkyX7Zlhi
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
Learning Image-Conditioned Dynamics Models for Control of Under-actuated Legged Millirobots
Millirobots are a promising robotic platform for many applications due to
their small size and low manufacturing costs. Legged millirobots, in
particular, can provide increased mobility in complex environments and improved
scaling of obstacles. However, controlling these small, highly dynamic, and
underactuated legged systems is difficult. Hand-engineered controllers can
sometimes control these legged millirobots, but they have difficulties with
dynamic maneuvers and complex terrains. We present an approach for controlling
a real-world legged millirobot that is based on learned neural network models.
Using less than 17 minutes of data, our method can learn a predictive model of
the robot's dynamics that can enable effective gaits to be synthesized on the
fly for following user-specified waypoints on a given terrain. Furthermore, by
leveraging expressive, high-capacity neural network models, our approach allows
for these predictions to be directly conditioned on camera images, endowing the
robot with the ability to predict how different terrains might affect its
dynamics. This enables sample-efficient and effective learning for locomotion
of a dynamic legged millirobot on various terrains, including gravel, turf,
carpet, and styrofoam. Experiment videos can be found at
https://sites.google.com/view/imageconddy
Efficient Dialog Policy Learning via Positive Memory Retention
This paper is concerned with the training of recurrent neural networks as
goal-oriented dialog agents using reinforcement learning. Training such agents
with policy gradients typically requires a large amount of samples. However,
the collection of the required data in form of conversations between chat-bots
and human agents is time-consuming and expensive. To mitigate this problem, we
describe an efficient policy gradient method using positive memory retention,
which significantly increases the sample-efficiency. We show that our method is
10 times more sample-efficient than policy gradients in extensive experiments
on a new synthetic number guessing game. Moreover, in a real-word visual object
discovery game, the proposed method is twice as sample-efficient as policy
gradients and shows state-of-the-art performance.Comment: Published in IEEE Spoken Language Technology (SLT 2018), Athens,
Greec
Learning Unmanned Aerial Vehicle Control for Autonomous Target Following
While deep reinforcement learning (RL) methods have achieved unprecedented
successes in a range of challenging problems, their applicability has been
mainly limited to simulation or game domains due to the high sample complexity
of the trial-and-error learning process. However, real-world robotic
applications often need a data-efficient learning process with safety-critical
constraints. In this paper, we consider the challenging problem of learning
unmanned aerial vehicle (UAV) control for tracking a moving target. To acquire
a strategy that combines perception and control, we represent the policy by a
convolutional neural network. We develop a hierarchical approach that combines
a model-free policy gradient method with a conventional feedback
proportional-integral-derivative (PID) controller to enable stable learning
without catastrophic failure. The neural network is trained by a combination of
supervised learning from raw images and reinforcement learning from games of
self-play. We show that the proposed approach can learn a target following
policy in a simulator efficiently and the learned behavior can be successfully
transferred to the DJI quadrotor platform for real-world UAV control
Continuous Deep Q-Learning with Model-based Acceleration
Model-free reinforcement learning has been successfully applied to a range of
challenging problems, and has recently been extended to handle large neural
network policies and value functions. However, the sample complexity of
model-free algorithms, particularly when using high-dimensional function
approximators, tends to limit their applicability to physical systems. In this
paper, we explore algorithms and representations to reduce the sample
complexity of deep reinforcement learning for continuous control tasks. We
propose two complementary techniques for improving the efficiency of such
algorithms. First, we derive a continuous variant of the Q-learning algorithm,
which we call normalized adantage functions (NAF), as an alternative to the
more commonly used policy gradient and actor-critic methods. NAF representation
allows us to apply Q-learning with experience replay to continuous tasks, and
substantially improves performance on a set of simulated robotic control tasks.
To further improve the efficiency of our approach, we explore the use of
learned models for accelerating model-free reinforcement learning. We show that
iteratively refitted local linear models are especially effective for this, and
demonstrate substantially faster learning on domains where such models are
applicable
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning
In this paper, we present a new intrinsically motivated actor-critic
algorithm for learning continuous motor skills directly from raw visual input.
Our neural architecture is composed of a critic and an actor network. Both
networks receive the hidden representation of a deep convolutional autoencoder
which is trained to reconstruct the visual input, while the centre-most hidden
representation is also optimized to estimate the state value. Separately, an
ensemble of predictive world models generates, based on its learning progress,
an intrinsic reward signal which is combined with the extrinsic reward to guide
the exploration of the actor-critic learner. Our approach is more
data-efficient and inherently more stable than the existing actor-critic
methods for continuous control from pixel data. We evaluate our algorithm for
the task of learning robotic reaching and grasping skills on a realistic
physics simulator and on a humanoid robot. The results show that the control
policies learned with our approach can achieve better performance than the
compared state-of-the-art and baseline algorithms in both dense-reward and
challenging sparse-reward settings
- …