3,260 research outputs found
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Simulations are attractive environments for training agents as they provide
an abundant source of data and alleviate certain safety concerns during the
training process. But the behaviours developed by agents in simulation are
often specific to the characteristics of the simulator. Due to modeling error,
strategies that are successful in simulation may not transfer to their real
world counterparts. In this paper, we demonstrate a simple method to bridge
this "reality gap". By randomizing the dynamics of the simulator during
training, we are able to develop policies that are capable of adapting to very
different dynamics, including ones that differ significantly from the dynamics
on which the policies were trained. This adaptivity enables the policies to
generalize to the dynamics of the real world without any training on the
physical system. Our approach is demonstrated on an object pushing task using a
robotic arm. Despite being trained exclusively in simulation, our policies are
able to maintain a similar level of performance when deployed on a real robot,
reliably moving an object to a desired location from random initial
configurations. We explore the impact of various design decisions and show that
the resulting policies are robust to significant calibration error
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
Learning high-quality Q-value functions plays a key role in the success of
many modern off-policy deep reinforcement learning (RL) algorithms. Previous
works focus on addressing the value overestimation issue, an outcome of
adopting function approximators and off-policy learning. Deviating from the
common viewpoint, we observe that Q-values are indeed underestimated in the
latter stage of the RL training process, primarily related to the use of
inferior actions from the current policy in Bellman updates as compared to the
more optimal action samples in the replay buffer. We hypothesize that this
long-neglected phenomenon potentially hinders policy learning and reduces
sample efficiency. Our insight to address this issue is to incorporate
sufficient exploitation of past successes while maintaining exploration
optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a
simple yet effective approach that updates Q-value using both historical
best-performing actions and the current policy. The instantiations of our
method in both model-free and model-based settings outperform state-of-the-art
methods in various continuous control tasks and achieve strong performance in
failure-prone scenarios and real-world robot tasks
On the Utility of Learning about Humans for Human-AI Coordination
While we would like agents that can coordinate with humans, current
algorithms such as self-play and population-based training create agents that
can coordinate with themselves. Agents that assume their partner to be optimal
or similar to them can converge to coordination protocols that fail to
understand and be understood by humans. To demonstrate this, we introduce a
simple environment that requires challenging coordination, based on the popular
game Overcooked, and learn a simple model that mimics human play. We evaluate
the performance of agents trained via self-play and population-based training.
These agents perform very well when paired with themselves, but when paired
with our human model, they are significantly worse than agents designed to play
with the human model. An experiment with a planning algorithm yields the same
conclusion, though only when the human-aware planner is given the exact human
model that it is playing with. A user study with real humans shows this pattern
as well, though less strongly. Qualitatively, we find that the gains come from
having the agent adapt to the human's gameplay. Given this result, we suggest
several approaches for designing agents that learn about humans in order to
better coordinate with them. Code is available at
https://github.com/HumanCompatibleAI/overcooked_ai.Comment: Published at NeurIPS 2019
(http://papers.nips.cc/paper/8760-on-the-utility-of-learning-about-humans-for-human-ai-coordination
Multi-resolution mapping and planning for UAV navigation in attitude constrained environments
In this thesis we aim to bridge the gap between high quality map reconstruction and Unmanned Aerial Vehicles (UAVs) SE(3) motion planning in challenging environments with narrow openings, such as disaster areas, which requires attitude to be considered. We propose an efficient system that leverages the concept of adaptive-resolution volumetric mapping, which naturally integrates with the hierarchical decomposition of space in an octree data structure. Instead of a Truncated Signed Distance Function (TSDF), we adopt mapping of occupancy probabilities in log-odds representation, which allows representation of both surfaces, as well as the entire free, i.e.\ observed space, as opposed to unobserved space. We introduce a method for choosing resolution -on the fly- in real-time by means of a multi-scale max-min pooling of the input depth image. The notion of explicit free space mapping paired with the spatial hierarchy in the data structure, as well as map resolution, allows for collision queries, as needed for robot motion planning, at unprecedented speed. Our mapping strategy supports pinhole cameras as well as spherical sensor models. Additionally, we introduce a first-of-a-kind global minimum cost path search method based on A* that considers attitude along the path. State-of-the-art methods incorporate attitude only in the refinement stage. To make the problem tractable, our method exploits an adaptive and coarse-to-fine approach using global and local A* runs, plus an efficient method to introduce the UAV attitude in the process. We integrate our method with an SE(3) trajectory optimisation method based on a safe-flight-corridor, yielding a complete path planning pipeline.
We quantitatively evaluate our mapping strategy in terms of mapping accuracy, memory, runtime performance, and planning performance showing improvements over the state-of-the-art, particularly in cases requiring high resolution maps. Furthermore, extensive evaluation is undertaken using the AirSim flight simulator under closed loop control in a set of randomised maps, allowing us to quantitatively assess our path initialisation method. We show that it achieves significantly higher success rates than the baselines, at a reduced computational burden.Open Acces
Goal-conditioned Offline Planning from Curious Exploration
Curiosity has established itself as a powerful exploration strategy in deep
reinforcement learning. Notably, leveraging expected future novelty as
intrinsic motivation has been shown to efficiently generate exploratory
trajectories, as well as a robust dynamics model. We consider the challenge of
extracting goal-conditioned behavior from the products of such unsupervised
exploration techniques, without any additional environment interaction. We find
that conventional goal-conditioned reinforcement learning approaches for
extracting a value function and policy fall short in this difficult offline
setting. By analyzing the geometry of optimal goal-conditioned value functions,
we relate this issue to a specific class of estimation artifacts in learned
values. In order to mitigate their occurrence, we propose to combine
model-based planning over learned value landscapes with a graph-based value
aggregation scheme. We show how this combination can correct both local and
global artifacts, obtaining significant improvements in zero-shot goal-reaching
performance across diverse simulated environments
From a Competition for Self-Driving Miniature Cars to a Standardized Experimental Platform: Concept, Models, Architecture, and Evaluation
Context: Competitions for self-driving cars facilitated the development and
research in the domain of autonomous vehicles towards potential solutions for
the future mobility.
Objective: Miniature vehicles can bridge the gap between simulation-based
evaluations of algorithms relying on simplified models, and those
time-consuming vehicle tests on real-scale proving grounds.
Method: This article combines findings from a systematic literature review,
an in-depth analysis of results and technical concepts from contestants in a
competition for self-driving miniature cars, and experiences of participating
in the 2013 competition for self-driving cars.
Results: A simulation-based development platform for real-scale vehicles has
been adapted to support the development of a self-driving miniature car.
Furthermore, a standardized platform was designed and realized to enable
research and experiments in the context of future mobility solutions.
Conclusion: A clear separation between algorithm conceptualization and
validation in a model-based simulation environment enabled efficient and
riskless experiments and validation. The design of a reusable, low-cost, and
energy-efficient hardware architecture utilizing a standardized
software/hardware interface enables experiments, which would otherwise require
resources like a large real-scale test track.Comment: 17 pages, 19 figues, 2 table
Path planning and energy management of solar-powered unmanned ground vehicles
Many of the applications pertinent to unmanned vehicles, such as environmental research and analysis, communications, and information-surveillance and reconnaissance, benefit from prolonged vehicle operation time. Conventional efforts to increase the operational time of electric-powered unmanned vehicles have traditionally focused on the design of energy-efficient components and the identification of energy efficient search patterns, while little attention has been paid to the vehicle\u27s mission-level path plan and power management. This thesis explores the formulation and generation of integrated motion-plans and power-schedules for solar-panel equipped mobile robots operating under strict energy constraints, which cannot be effectively addressed through conventional motion planning algorithms. Transit problems are considered to design time-optimal paths using both Balkcom-Mason and Pseudo-Dubins curves. Additionally, a more complicated problem to generate mission plans for vehicles which must persistently travel between certain locations, similar to the traveling salesperson problem (TSP), is presented. A comparison between one of the common motion-planning algorithms and experimental results of the prescribed algorithms, made possible by use of a test environment and mobile robot designed and developed specifically for this research, are presented and discussed
- …