777 research outputs found
A nonparametric Bayesian approach toward robot learning by demonstration
In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose. A major limitation of GMR models concerns automatic selection of the proper number of model states, i.e., the number of model component densities. Existing methods, including likelihood- or entropy-based criteria, usually tend to yield noisy model size estimates while imposing heavy computational requirements. Recently, Dirichlet process (infinite) mixture models have emerged in the cornerstone of nonparametric Bayesian statistics as promising candidates for clustering applications where the number of clusters is unknown a priori. Under this motivation, to resolve the aforementioned issues of GMR-based methods for robot learning by demonstration, in this paper we introduce a nonparametric Bayesian formulation for the GMR model, the Dirichlet process GMR model. We derive an efficient variational Bayesian inference algorithm for the proposed model, and we experimentally investigate its efficacy as a robot learning by demonstration methodology, considering a number of demanding robot learning by demonstration scenarios
Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies for Robot Manipulation
Humans demonstrate a variety of interesting behavioral characteristics when
performing tasks, such as selecting between seemingly equivalent optimal
actions, performing recovery actions when deviating from the optimal
trajectory, or moderating actions in response to sensed risks. However,
imitation learning, which attempts to teach robots to perform these same tasks
from observations of human demonstrations, often fails to capture such
behavior. Specifically, commonly used learning algorithms embody inherent
contradictions between the learning assumptions (e.g., single optimal action)
and actual human behavior (e.g., multiple optimal actions), thereby limiting
robot generalizability, applicability, and demonstration feasibility. To
address this, this paper proposes designing imitation learning algorithms with
a focus on utilizing human behavioral characteristics, thereby embodying
principles for capturing and exploiting actual demonstrator behavioral
characteristics. This paper presents the first imitation learning framework,
Bayesian Disturbance Injection (BDI), that typifies human behavioral
characteristics by incorporating model flexibility, robustification, and risk
sensitivity. Bayesian inference is used to learn flexible non-parametric
multi-action policies, while simultaneously robustifying policies by injecting
risk-sensitive disturbances to induce human recovery action and ensuring
demonstration feasibility. Our method is evaluated through risk-sensitive
simulations and real-robot experiments (e.g., table-sweep task, shaft-reach
task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate
the improved characterisation of behavior. Results show significant improvement
in task performance, through improved flexibility, robustness as well as
demonstration feasibility.Comment: 69 pages, 9 figures, accepted by Elsevier Neural Networks - Journa
Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies
Scenarios requiring humans to choose from multiple seemingly optimal actions
are commonplace, however standard imitation learning often fails to capture
this behavior. Instead, an over-reliance on replicating expert actions induces
inflexible and unstable policies, leading to poor generalizability in an
application. To address the problem, this paper presents the first imitation
learning framework that incorporates Bayesian variational inference for
learning flexible non-parametric multi-action policies, while simultaneously
robustifying the policies against sources of error, by introducing and
optimizing disturbances to create a richer demonstration dataset. This
combinatorial approach forces the policy to adapt to challenging situations,
enabling stable multi-action policies to be learned efficiently. The
effectiveness of our proposed method is evaluated through simulations and
real-robot experiments for a table-sweep task using the UR3 6-DOF robotic arm.
Results show that, through improved flexibility and robustness, the learning
performance and control safety are better than comparison methods.Comment: 7 pages, Accepted by the 2021 International Conference on Robotics
and Automation (ICRA 2021
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be
capable of learning a wide range of robotic skills, but typically require a
very large number of samples to achieve good performance. Model-based
algorithms, in principle, can provide for much more efficient learning, but
have proven difficult to extend to expressive, high-capacity models such as
deep neural networks. In this work, we demonstrate that medium-sized neural
network models can in fact be combined with model predictive control (MPC) to
achieve excellent sample complexity in a model-based reinforcement learning
algorithm, producing stable and plausible gaits to accomplish various complex
locomotion tasks. We also propose using deep neural network dynamics models to
initialize a model-free learner, in order to combine the sample efficiency of
model-based approaches with the high task-specific performance of model-free
methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure
model-based approach trained on just random action data can follow arbitrary
trajectories with excellent sample efficiency, and that our hybrid algorithm
can accelerate model-free learning on high-speed benchmark tasks, achieving
sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.
Videos can be found at https://sites.google.com/view/mbm
- …