32 research outputs found
A Physically-Consistent Bayesian Non-Parametric Mixture Model for Dynamical System Learning
Figueroa N, Billard A. A Physically-Consistent Bayesian Non-Parametric Mixture Model for Dynamical System Learning. In: Billard A, Dragan A, Peters J, Morimoto J, eds. 2nd Annual Conference on Robot Learning, CoRL 2018, Zürich, Switzerland, 29-31 October 2018, Proceedings. Proceedings of Machine Learning Research. Vol 87. 2018: 927-946
Simulation-based reinforcement learning for real-world autonomous driving
We use reinforcement learning in simulation to obtain a driving system
controlling a full-size real-world vehicle. The driving policy takes RGB images
from a single camera and their semantic segmentation as input. We use mostly
synthetic data, with labelled real-world data appearing only in the training of
the segmentation network.
Using reinforcement learning in simulation and synthetic data is motivated by
lowering costs and engineering effort.
In real-world experiments we confirm that we achieved successful sim-to-real
policy transfer. Based on the extensive evaluation, we analyze how design
decisions about perception, control, and training impact the real-world
performance
RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph
Developing robotic intelligent systems that can adapt quickly to unseen wild
situations is one of the critical challenges in pursuing autonomous robotics.
Although some impressive progress has been made in walking stability and skill
learning in the field of legged robots, their ability to fast adaptation is
still inferior to that of animals in nature. Animals are born with massive
skills needed to survive, and can quickly acquire new ones, by composing
fundamental skills with limited experience. Inspired by this, we propose a
novel framework, named Robot Skill Graph (RSG) for organizing massive
fundamental skills of robots and dexterously reusing them for fast adaptation.
Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of
massive dynamic behavioral skills instead of static knowledge in KG and enables
discovering implicit relations that exist in be-tween of learning context and
acquired skills of robots, serving as a starting point for understanding subtle
patterns existing in robots' skill learning. Extensive experimental results
demonstrate that RSG can provide rational skill inference upon new tasks and
environments and enable quadruped robots to adapt to new scenarios and learn
new skills rapidly
ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution
We consider the problem of efficient blackbox optimization over a large
hybrid search space, consisting of a mixture of a high dimensional continuous
space and a complex combinatorial space. Such examples arise commonly in
evolutionary computation, but also more recently, neuroevolution and
architecture search for Reinforcement Learning (RL) policies. Unfortunately
however, previous mutation-based approaches suffer in high dimensional
continuous spaces both theoretically and practically. We thus instead propose
ES-ENAS, a simple joint optimization procedure by combining Evolutionary
Strategies (ES) and combinatorial optimization techniques in a highly scalable
and intuitive way, inspired by the one-shot or supernet paradigm introduced in
Efficient Neural Architecture Search (ENAS). Through this relatively simple
marriage between two different lines of research, we are able to gain the best
of both worlds, and empirically demonstrate our approach by optimizing BBOB
functions over hybrid spaces as well as combinatorial neural network
architectures via edge pruning and quantization on popular RL benchmarks. Due
to the modularity of the algorithm, we also are able incorporate a wide variety
of popular techniques ranging from use of different continuous and
combinatorial optimizers, as well as constrained optimization.Comment: 22 pages. See
https://github.com/google-research/google-research/tree/master/es_enas for
associated cod
Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers
We propose to address quadrupedal locomotion tasks using Reinforcement
Learning (RL) with a Transformer-based model that learns to combine
proprioceptive information and high-dimensional depth sensor inputs. While
learning-based locomotion has made great advances using RL, most methods still
rely on domain randomization for training blind agents that generalize to
challenging terrains. Our key insight is that proprioceptive states only offer
contact measurements for immediate reaction, whereas an agent equipped with
visual sensory observations can learn to proactively maneuver environments with
obstacles and uneven terrain by anticipating changes in the environment many
steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL
method for quadrupedal locomotion that leverages a Transformer-based model for
fusing proprioceptive states and visual observations. We evaluate our method in
challenging simulated environments with different obstacles and uneven terrain.
We show that our method obtains significant improvements over policies with
only proprioceptive state inputs, and that Transformer-based models further
improve generalization across environments. Our project page with videos is at
https://RchalYang.github.io/LocoTransformer .Comment: Our project page with videos is at
https://RchalYang.github.io/LocoTransforme
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
In the field of reinforcement learning, because of the high cost and risk of
policy training in the real world, policies are trained in a simulation
environment and transferred to the corresponding real-world environment.
However, the simulation environment does not perfectly mimic the real-world
environment, lead to model misspecification. Multiple studies report
significant deterioration of policy performance in a real-world environment. In
this study, we focus on scenarios involving a simulation environment with
uncertainty parameters and the set of their possible values, called the
uncertainty parameter set. The aim is to optimize the worst-case performance on
the uncertainty parameter set to guarantee the performance in the corresponding
real-world environment. To obtain a policy for the optimization, we propose an
off-policy actor-critic approach called the Max-Min Twin Delayed Deep
Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min
optimization problem using a simultaneous gradient ascent descent approach.
Experiments in multi-joint dynamics with contact (MuJoCo) environments show
that the proposed method exhibited a worst-case performance superior to several
baseline approaches.Comment: Neural Information Processing Systems 2022 (NeurIPS '22
BC-IRL: Learning Generalizable Reward Functions from Demonstrations
How well do reward functions learned with inverse reinforcement learning
(IRL) generalize? We illustrate that state-of-the-art IRL algorithms, which
maximize a maximum-entropy objective, learn rewards that overfit to the
demonstrations. Such rewards struggle to provide meaningful rewards for states
not covered by the demonstrations, a major detriment when using the reward to
learn policies in new situations. We introduce BC-IRL a new inverse
reinforcement learning method that learns reward functions that generalize
better when compared to maximum-entropy IRL approaches. In contrast to the
MaxEnt framework, which learns to maximize rewards around demonstrations,
BC-IRL updates reward parameters such that the policy trained with the new
reward matches the expert demonstrations better. We show that BC-IRL learns
rewards that generalize better on an illustrative simple task and two
continuous robotic control tasks, achieving over twice the success rate of
baselines in challenging generalization settings
Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees
We study the problem of learning controllers for discrete-time non-linear
stochastic dynamical systems with formal reach-avoid guarantees. This work
presents the first method for providing formal reach-avoid guarantees, which
combine and generalize stability and safety guarantees, with a tolerable
probability threshold over the infinite time horizon. Our method
leverages advances in machine learning literature and it represents formal
certificates as neural networks. In particular, we learn a certificate in the
form of a reach-avoid supermartingale (RASM), a novel notion that we introduce
in this work. Our RASMs provide reachability and avoidance guarantees by
imposing constraints on what can be viewed as a stochastic extension of level
sets of Lyapunov functions for deterministic systems. Our approach solves
several important problems -- it can be used to learn a control policy from
scratch, to verify a reach-avoid specification for a fixed control policy, or
to fine-tune a pre-trained policy if it does not satisfy the reach-avoid
specification. We validate our approach on stochastic non-linear
reinforcement learning tasks.Comment: Accepted at AAAI 202