28 research outputs found
Data-efficient, Explainable and Safe Box Manipulation: Illustrating the Advantages of Physical Priors in Model-Predictive Control
Model-based RL/control have gained significant traction in robotics. Yet,
these approaches often remain data-inefficient and lack the explainability of
hand-engineered solutions. This makes them difficult to debug/integrate in
safety-critical settings. However, in many systems, prior knowledge of
environment kinematics/dynamics is available. Incorporating such priors can
help address the aforementioned problems by reducing problem complexity and the
need for exploration, while also facilitating the expression of the decisions
taken by the agent in terms of physically meaningful entities. Our aim with
this paper is to illustrate and support this point of view via a case-study. We
model a payload manipulation problem based on a real robotic system, and show
that leveraging prior knowledge about the dynamics of the environment in an MPC
framework can lead to improvements in explainability, safety and
data-efficiency, leading to satisfying generalization properties with less
data.Comment: accepted for publication by l4dc 2024, 12 pages (with references), 4
figures, 2 table
Adaptive Asynchronous Control Using Meta-learned Neural Ordinary Differential Equations
Model-based Reinforcement Learning and Control have demonstrated great
potential in various sequential decision making problem domains, including in
robotics settings. However, real-world robotics systems often present
challenges that limit the applicability of those methods. In particular, we
note two problems that jointly happen in many industrial systems: 1)
Irregular/asynchronous observations and actions and 2) Dramatic changes in
environment dynamics from an episode to another (e.g. varying payload inertial
properties). We propose a general framework that overcomes those difficulties
by meta-learning adaptive dynamics models for continuous-time prediction and
control. The proposed approach is task-agnostic and can be adapted to new tasks
in a straight-forward manner. We present evaluations in two different robot
simulations and on a real industrial robot.Comment: 16 double column pages, 14 figures, 3 table
Few-shot Quality-Diversity Optimization
In the past few years, a considerable amount of research has been dedicated
to the exploitation of previous learning experiences and the design of Few-shot
and Meta Learning approaches, in problem domains ranging from Computer Vision
to Reinforcement Learning based control. A notable exception, where to the best
of our knowledge, little to no effort has been made in this direction is
Quality-Diversity (QD) optimization. QD methods have been shown to be effective
tools in dealing with deceptive minima and sparse rewards in Reinforcement
Learning. However, they remain costly due to their reliance on inherently
sample inefficient evolutionary processes. We show that, given examples from a
task distribution, information about the paths taken by optimization in
parameter space can be leveraged to build a prior population, which when used
to initialize QD methods in unseen environments, allows for few-shot
adaptation. Our proposed method does not require backpropagation. It is simple
to implement and scale, and furthermore, it is agnostic to the underlying
models that are being trained. Experiments carried in both sparse and dense
reward settings using robotic manipulation and navigation benchmarks show that
it considerably reduces the number of generations that are required for QD
optimization in these environments.Comment: Accepted for publication in the IEEE Robotics and Automation Letters
(RA-L) journa
Behavioral Repertoire via Generative Adversarial Policy Networks
Learning algorithms are enabling robots to solve increasingly challenging
real-world tasks. These approaches often rely on demonstrations and reproduce
the behavior shown. Unexpected changes in the environment may require using
different behaviors to achieve the same effect, for instance to reach and grasp
an object in changing clutter. An emerging paradigm addressing this robustness
issue is to learn a diverse set of successful behaviors for a given task, from
which a robot can select the most suitable policy when faced with a new
environment. In this paper, we explore a novel realization of this vision by
learning a generative model over policies. Rather than learning a single
policy, or a small fixed repertoire, our generative model for policies
compactly encodes an unbounded number of policies and allows novel controller
variants to be sampled. Leveraging our generative policy network, a robot can
sample novel behaviors until it finds one that works for a new environment. We
demonstrate this idea with an application of robust ball-throwing in the
presence of obstacles. We show that this approach achieves a greater diversity
of behaviors than an existing evolutionary approach, while maintaining good
efficacy of sampled behaviors, allowing a Baxter robot to hit targets more
often when ball throwing in the presence of obstacles.Comment: In Proceedings of 2019 Joint IEEE 9th International Conference on
Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pages 320 -
32