Search CORE

3,260 research outputs found

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

Author: Abbeel Pieter
Andrychowicz Marcin
Peng Xue Bin
Zaremba Wojciech
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/03/2018
Field of study

Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts. In this paper, we demonstrate a simple method to bridge this "reality gap". By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained. This adaptivity enables the policies to generalize to the dynamics of the real world without any training on the physical system. Our approach is demonstrated on an object pushing task using a robotic arm. Despite being trained exclusively in simulation, our policies are able to maintain a similar level of performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. We explore the impact of various design decisions and show that the resulting policies are robust to significant calibration error

arXiv.org e-Print Archive

Crossref

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Author: Ji Tianying
Luo Yu
Sun Fuchun
Xu Huazhe
Zhan Xianyuan
Zhang Jianwei
Publication venue
Publication date: 12/10/2023
Field of study

Learning high-quality Q-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms. Previous works focus on addressing the value overestimation issue, an outcome of adopting function approximators and off-policy learning. Deviating from the common viewpoint, we observe that Q-values are indeed underestimated in the latter stage of the RL training process, primarily related to the use of inferior actions from the current policy in Bellman updates as compared to the more optimal action samples in the replay buffer. We hypothesize that this long-neglected phenomenon potentially hinders policy learning and reduces sample efficiency. Our insight to address this issue is to incorporate sufficient exploitation of past successes while maintaining exploration optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a simple yet effective approach that updates Q-value using both historical best-performing actions and the current policy. The instantiations of our method in both model-free and model-based settings outperform state-of-the-art methods in various continuous control tasks and achieve strong performance in failure-prone scenarios and real-world robot tasks

arXiv.org e-Print Archive

On the Utility of Learning about Humans for Human-AI Coordination

Author: Abbeel Pieter
Carroll Micah
Dragan Anca
Griffiths Thomas L.
Ho Mark K.
Seshia Sanjit A.
Shah Rohin
Publication venue
Publication date: 01/01/2019
Field of study

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human's gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.Comment: Published at NeurIPS 2019 (http://papers.nips.cc/paper/8760-on-the-utility-of-learning-about-humans-for-human-ai-coordination

arXiv.org e-Print Archive

eScholarship - University of California

Multi-resolution mapping and planning for UAV navigation in attitude constrained environments

Author: Funk Nils
Publication venue: Computing, Imperial College London
Publication date: 01/10/2023
Field of study

In this thesis we aim to bridge the gap between high quality map reconstruction and Unmanned Aerial Vehicles (UAVs) SE(3) motion planning in challenging environments with narrow openings, such as disaster areas, which requires attitude to be considered. We propose an efficient system that leverages the concept of adaptive-resolution volumetric mapping, which naturally integrates with the hierarchical decomposition of space in an octree data structure. Instead of a Truncated Signed Distance Function (TSDF), we adopt mapping of occupancy probabilities in log-odds representation, which allows representation of both surfaces, as well as the entire free, i.e.\ observed space, as opposed to unobserved space. We introduce a method for choosing resolution -on the fly- in real-time by means of a multi-scale max-min pooling of the input depth image. The notion of explicit free space mapping paired with the spatial hierarchy in the data structure, as well as map resolution, allows for collision queries, as needed for robot motion planning, at unprecedented speed. Our mapping strategy supports pinhole cameras as well as spherical sensor models. Additionally, we introduce a first-of-a-kind global minimum cost path search method based on A* that considers attitude along the path. State-of-the-art methods incorporate attitude only in the refinement stage. To make the problem tractable, our method exploits an adaptive and coarse-to-fine approach using global and local A* runs, plus an efficient method to introduce the UAV attitude in the process. We integrate our method with an SE(3) trajectory optimisation method based on a safe-flight-corridor, yielding a complete path planning pipeline. We quantitatively evaluate our mapping strategy in terms of mapping accuracy, memory, runtime performance, and planning performance showing improvements over the state-of-the-art, particularly in cases requiring high resolution maps. Furthermore, extensive evaluation is undertaken using the AirSim flight simulator under closed loop control in a set of randomised maps, allowing us to quantitatively assess our path initialisation method. We show that it achieves significantly higher success rates than the baselines, at a reduced computational burden.Open Acces

Spiral - Imperial College Digital Repository

Goal-conditioned Offline Planning from Curious Exploration

Author: Bagatella Marco
Martius Georg
Publication venue
Publication date: 28/11/2023
Field of study

Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments

arXiv.org e-Print Archive

From a Competition for Self-Driving Miniature Cars to a Standardized Experimental Platform: Concept, Models, Architecture, and Evaluation

Author: Berger Christian
Publication venue
Publication date: 01/01/2014
Field of study

Context: Competitions for self-driving cars facilitated the development and research in the domain of autonomous vehicles towards potential solutions for the future mobility. Objective: Miniature vehicles can bridge the gap between simulation-based evaluations of algorithms relying on simplified models, and those time-consuming vehicle tests on real-scale proving grounds. Method: This article combines findings from a systematic literature review, an in-depth analysis of results and technical concepts from contestants in a competition for self-driving miniature cars, and experiences of participating in the 2013 competition for self-driving cars. Results: A simulation-based development platform for real-scale vehicles has been adapted to support the development of a self-driving miniature car. Furthermore, a standardized platform was designed and realized to enable research and experiments in the context of future mobility solutions. Conclusion: A clear separation between algorithm conceptualization and validation in a model-based simulation environment enabled efficient and riskless experiments and validation. The design of a reusable, low-cost, and energy-efficient hardware architecture utilizing a standardized software/hardware interface enables experiments, which would otherwise require resources like a large real-scale test track.Comment: 17 pages, 19 figues, 2 table

arXiv.org e-Print Archive

Chalmers Research

Path planning and energy management of solar-powered unmanned ground vehicles

Author: Kaplan Adam
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2015
Field of study

Many of the applications pertinent to unmanned vehicles, such as environmental research and analysis, communications, and information-surveillance and reconnaissance, benefit from prolonged vehicle operation time. Conventional efforts to increase the operational time of electric-powered unmanned vehicles have traditionally focused on the design of energy-efficient components and the identification of energy efficient search patterns, while little attention has been paid to the vehicle\u27s mission-level path plan and power management. This thesis explores the formulation and generation of integrated motion-plans and power-schedules for solar-panel equipped mobile robots operating under strict energy constraints, which cannot be effectively addressed through conventional motion planning algorithms. Transit problems are considered to design time-optimal paths using both Balkcom-Mason and Pseudo-Dubins curves. Additionally, a more complicated problem to generate mission plans for vehicles which must persistently travel between certain locations, similar to the traveling salesperson problem (TSP), is presented. A comparison between one of the common motion-planning algorithms and experimental results of the prescribed algorithms, made possible by use of a test environment and mobile robot designed and developed specifically for this research, are presented and discussed

Digital Repository @ Iowa State University (ISU)