6 research outputs found
Stable Motion Primitives via Imitation and Contrastive Learning
Learning from humans allows non-experts to program robots with ease, lowering
the resources required to build complex robotic solutions. Nevertheless, such
data-driven approaches often lack the ability to provide guarantees regarding
their learned behaviors, which is critical for avoiding failures and/or
accidents. In this work, we focus on reaching/point-to-point motions, where
robots must always reach their goal, independently of their initial state. This
can be achieved by modeling motions as dynamical systems and ensuring that they
are globally asymptotically stable. Hence, we introduce a novel Contrastive
Learning loss for training Deep Neural Networks (DNN) that, when used together
with an Imitation Learning loss, enforces the aforementioned stability in the
learned motions. Differently from previous work, our method does not restrict
the structure of its function approximator, enabling its use with arbitrary
DNNs and allowing it to learn complex motions with high accuracy. We validate
it using datasets and a real robot. In the former case, motions are 2 and 4
dimensional, modeled as first- and second-order dynamical systems. In the
latter, motions are 3, 4, and 6 dimensional, of first and second order, and are
used to control a 7DoF robot manipulator in its end effector space and joint
space. More details regarding the real-world experiments are presented in:
\url{https://youtu.be/OM-2edHBRfc}
Deep Metric Imitation Learning for Stable Motion Primitives
Imitation Learning (IL) is a powerful technique for intuitive robotic
programming. However, ensuring the reliability of learned behaviors remains a
challenge. In the context of reaching motions, a robot should consistently
reach its goal, regardless of its initial conditions. To meet this requirement,
IL methods often employ specialized function approximators that guarantee this
property by construction. Although effective, these approaches come with a set
of limitations: 1) they are unable to fully exploit the capabilities of modern
Deep Neural Network (DNN) architectures, 2) some are restricted in the family
of motions they can model, resulting in suboptimal IL capabilities, and 3) they
require explicit extensions to account for the geometry of motions that
consider orientations. To address these challenges, we introduce a novel
stability loss function, drawing inspiration from the triplet loss used in the
deep metric learning literature. This loss does not constrain the DNN's
architecture and enables learning policies that yield accurate results.
Furthermore, it is easily adaptable to the geometry of the robot's state space.
We provide a proof of the stability properties induced by this loss and
empirically validate our method in various settings. These settings include
Euclidean and non-Euclidean state spaces, as well as first-order and
second-order motions, both in simulation and with real robots. More details
about the experimental results can be found at: https://youtu.be/ZWKLGntCI6w.Comment: 21 pages, 15 figures, 4 table
Interactive Imitation Learning in Robotics: A Survey
Interactive Imitation Learning (IIL) is a branch of Imitation Learning (IL)
where human feedback is provided intermittently during robot execution allowing
an online improvement of the robot's behavior. In recent years, IIL has
increasingly started to carve out its own space as a promising data-driven
alternative for solving complex robotic tasks. The advantages of IIL are its
data-efficient, as the human feedback guides the robot directly towards an
improved behavior, and its robustness, as the distribution mismatch between the
teacher and learner trajectories is minimized by providing feedback directly
over the learner's trajectories. Nevertheless, despite the opportunities that
IIL presents, its terminology, structure, and applicability are not clear nor
unified in the literature, slowing down its development and, therefore, the
research of innovative formulations and discoveries. In this article, we
attempt to facilitate research in IIL and lower entry barriers for new
practitioners by providing a survey of the field that unifies and structures
it. In addition, we aim to raise awareness of its potential, what has been
accomplished and what are still open research questions. We organize the most
relevant works in IIL in terms of human-robot interaction (i.e., types of
feedback), interfaces (i.e., means of providing feedback), learning (i.e.,
models learned from feedback and function approximators), user experience
(i.e., human perception about the learning process), applications, and
benchmarks. Furthermore, we analyze similarities and differences between IIL
and RL, providing a discussion on how the concepts offline, online, off-policy
and on-policy learning should be transferred to IIL from the RL literature. We
particularly focus on robotic applications in the real world and discuss their
implications, limitations, and promising future areas of research
Interactive learning with corrective feedback for continuous-action policies based on deep neural networks
Tesis para optar al grado de Magíster en Ciencias de la Ingeniería, Mención EléctricaMemoria para optar al título de Ingeniero Civil EléctricoEl Aprendizaje Reforzado Profundo (DRL) se ha transformado en una metodología poderosa para resolver problemas complejos de toma de decisión secuencial. Sin embargo, el DRL tiene varias limitaciones cuando es usado en problemas del mundo real (p.ej. aplicaciones de robótica). Por ejemplo, largos tiempos de entrenamiento (que no se pueden acelerar) son requeridos, en contraste con ambientes simulados, y las funciones de recompensa pueden ser difíciles de especificar/modelar y/o computar. Más aún, el traspaso de políticas aprendidas en simulaciones al mundo real no es directo (\emph{reality gap}). Por otro lado, métodos de aprendizaje de máquinas basados en la transferencia de conocimiento humano a un agente han mostrado ser capaces de obtener políticas con buenos desempeños sin necesariamente requerir el uso de una función de recompensa, siendo eficientes en lo que respecta al tiempo.
En este contexto, en esta tesis se introduce una estrategia de Aprendizaje Interactivo de Máquinas (IML) para entrenar políticas modeladas como Redes Neuronales Profundas (DNNs), basada en retroalimentación correctiva humana con un método llamado D-COACH. Se combina Aprendizaje Profundo (DL) con el método Asesoramiento Correctivo Comunicado por Humanos (COACH), en donde humanos no expertos pueden entrenar políticas corrigiendo las acciones que va tomando el agente en ejecución. El método D-COACH tiene el potencial de resolver problemas complejos sin la necesidad de utilizar muchos datos o tiempo. Resultados experimentales validan la eficiencia del método propuesto en plataformas simuladas y del mundo real, en espacios de estados de baja y alta dimensionalidad, mostrando la capacidad de aprender políticas en espacios de acción continuos de manera efectiva.
El método propuesto mostró resultados particularmente interesantes cuando políticas parametrizadas con Redes Neuronales Convolucionales (CNNs) fueron usadas para resolver problemas con espacios de estado de alta dimensionalidad, como pixeles desde una imagen. Al usar CNNs, los agentes tienen la capacidad de construir valiosas representaciones del estado del ambiente sin la necesidad de hacer ingeniería de características por el lado del diseñador (lo que era siempre necesario en el Aprendizaje Reforzado (RL) clásico). Estas propiedades pueden ser muy útiles en robótica, ya que es común encontrar aplicaciones en donde la información adquirida por los sensores del sistema es de alta dimensionalidad, como imágenes RGB. Darles la habilidad a los robots de aprender desde datos del alta dimensionalidad va a permitir aumentar la complejidad de los problemas que estos pueden resolver.
A lo largo de esta tesis se proponen y validan tres variaciones de D-COACH. La primera introduce una estructura general para resolver problemas de estado de baja y alta dimensionalidad. La segunda propone una variación del primer método propuesto para problemas de estado de alta dimensionalidad, reduciendo el tiempo y esfuerzo de un humano al entrenar una política. Y por último, la tercera introduce el uso de Redes Neuronales Recurrentes para añadirle memoria a los agentes en problemas con observabilidad parcial.FONDECYT 116150
Interactive learning of temporal features for control. Shaping Policies and state representations from human feedback
Current ongoing industry revolution demands more flexible products, including robots in household environments and medium-scale factories. Such robots should be able to adapt to new conditions and environments and be programmed with ease. As an example, let us suppose that there are robot manipulators working on an industrial production line and that they need to perform a new task. If these robots were hard coded, it could take days to adapt them to the new settings, which would stop production at the factory. Robots that non-expert humans could easily program would speed up the process considerably.Netherlands Organization for Scientific Research project Cognitive Robots for Flexible Agro-Food Technology
P17-01
European Research Council (ERC)
804907
Chile's National Fund for Scientific and Technological Development project (FONDECYT)
1201170
Chile's Associative Research Program of the National Research and Development Agency (ANID/PIA)
AFB18000