1,453 research outputs found
Temporal Segmentation of Pair-Wise Interaction Phases in Sequential Manipulation Demonstrations
International audienceWe consider the problem of learning from complex sequential demonstrations. We propose to analyze demonstrations in terms of the concurrent interaction phases which arise between pairs of involved bodies (hand-object and object-object). These interaction phases are the key to decompose a full demonstration into its atomic manipulation actions and to extract their respective consequences. In particular, one may assume that the goal of each interaction phase is to achieve specific geometric constraints between objects. This generalizes previous Learning from Demonstration approaches by considering not just the motion of the end-effector but also the relational properties of the objects' motion. We present a linear-chain Conditional Random Field model to detect the pair-wise interaction phases and extract the geometric constraints that are established in the environment, which represent a high-level task oriented description of the demonstrated manipulation. We test our system on single- and multi-agent demonstrations of assembly tasks, respectively of a wooden toolbox and a plastic chair
Online Robot Introspection via Wrench-based Action Grammars
Robotic failure is all too common in unstructured robot tasks. Despite
well-designed controllers, robots often fail due to unexpected events. How do
robots measure unexpected events? Many do not. Most robots are driven by the
sense-plan act paradigm, however more recently robots are undergoing a
sense-plan-act-verify paradigm. In this work, we present a principled
methodology to bootstrap online robot introspection for contact tasks. In
effect, we are trying to enable the robot to answer the question: what did I
do? Is my behavior as expected or not? To this end, we analyze noisy wrench
data and postulate that the latter inherently contains patterns that can be
effectively represented by a vocabulary. The vocabulary is generated by
segmenting and encoding the data. When the wrench information represents a
sequence of sub-tasks, we can think of the vocabulary forming a sentence (set
of words with grammar rules) for a given sub-task; allowing the latter to be
uniquely represented. The grammar, which can also include unexpected events,
was classified in offline and online scenarios as well as for simulated and
real robot experiments. Multiclass Support Vector Machines (SVMs) were used
offline, while online probabilistic SVMs were are used to give temporal
confidence to the introspection result. The contribution of our work is the
presentation of a generalizable online semantic scheme that enables a robot to
understand its high-level state whether nominal or abnormal. It is shown to
work in offline and online scenarios for a particularly challenging contact
task: snap assemblies. We perform the snap assembly in one-arm simulated and
real one-arm experiments and a simulated two-arm experiment. This verification
mechanism can be used by high-level planners or reasoning systems to enable
intelligent failure recovery or determine the next most optima manipulation
skill to be used.Comment: arXiv admin note: substantial text overlap with arXiv:1609.0494
Robots Learning Manipulation Tasks from Demonstrations and Practice
Developing personalized cognitive robots that help with everyday tasks is one of the on-going topics in robotics research. Such robots should have the capability to learn skills and perform tasks in new situations. In this thesis, we study three research problems to explore the learning methods of robots in the setting of manipulation tasks. In the first problem, we investigate hand movement learning from human demonstrations. For practical purposes, we propose a system for learning hand actions from markerless demonstrations, which are captured using the Kinect sensor. The algorithm autonomously segments an example trajectory into multiple action units, each described by a movement primitive, and forms a task-specific model. With that, similar movements for different scenarios can be generated, and performed on Baxter Robots.
The second problem aims to address learning robot movement adaptation under various environmental constraints. A common approach is to adopt motion primitives to generate target motions from demonstrations. However, their generalization capability is weak for novel environments. Additionally, traditional motion generation methods do not consider versatile constraints from different users, tasks, and environments. In this work, we propose a co-active learning framework for learning to adapt the movement of robot end-effectors for manipulation tasks. It is designed to adapt the original imitation trajectories, which are learned from demonstrations, to novel situations with different constraints. The framework also considers user feedback towards the adapted trajectories, and it learns to adapt movement through human-in-the-loop interactions. Experiments on a humanoid platform validate the effectiveness of our approach.
In order to further adapt robots to perform more complex manipulation tasks, as the third problem, we are investigating a framework that the robot could not only plan and execute the sequential task in a new environment, but also refine its actions by learning subgoals through re-planning/re-execution during the practice. A sequential task is naturally considered as a sequence of pre-learned action primitives, each action primitive has its own goal parameters corresponding to the subgoal. We propose a system to learn the subgoals distribution of given task model using reinforcement learning by iteratively updating the parameters in the trials. As a result, by considering the learned subgoals distribution in sequential motion planning, the proposed framework could adaptively select better subgoals to generate movements for robot to execute the task successfully. We implement the framework for the task of ''openning a microwave'' involving a sequence of primitive actions and subgoals and validate it on Baxter platform
Robot Programming from Demonstration, Feedback and Transfer
International audienceThis paper presents a novel approach for robot instruction for assembly tasks. We consider that robot programming can be made more efficient, precise and intuitive if we leverage the advantages of complementary approaches such as learning from demonstration, learning from feedback and knowledge transfer. Starting from low-level demonstrations of assembly tasks, the system is able to extract a high-level relational plan of the task. A graphical user interface (GUI) allows then the user to iteratively correct the acquired knowledge by refining high-level plans, and low-level geometrical knowledge of the task. This combination leads to a faster programming phase, more precise than just demonstrations, and more intuitive than just through a GUI. A final process allows to reuse high-level task knowledge for similar tasks in a transfer learning fashion. Finally we present a user study illustrating the advantages of this approach
From Line Drawings to Human Actions: Deep Neural Networks for Visual Data Representation
In recent years, deep neural networks have been very successful
in computer vision, speech recognition, and artificial
intelligent systems. The rapid growth of data and fast increasing
computational tools provide solid foundations for the
applications which rely on the learning of large scale deep
neural networks with millions of parameters. The deep learning
approaches have been proved to be able to learn powerful
representations of the inputs in various tasks, such as image
classification, object recognition, and scene understanding. This
thesis demonstrates the generality and capacity of deep learning
approaches through a series of case studies including image
matching and human activity understanding. In these studies, I
explore the combinations of the neural network models with
existing machine learning techniques and extend the deep learning
approach for each task. Four related tasks are investigated: 1)
image matching through similarity learning; 2) human action
prediction; 3) finger force estimation in manipulation actions;
and 4) bimodal learning for human action understanding.
Deep neural networks have been shown to be very efficient in
supervised learning. Further, in some tasks, one would like to
group the features of the samples in the same category close to
each other, in additional to the discriminative representation.
Such kind of properties is desired in a number of applications,
such as semantic retrieval, image quality measurement, and social
network analysis, etc. My first study is to develop a similarity
learning method based on deep neural networks for image matching
between sketch images and 3D models. In this task, I propose to
use Siamese network to learn similarities of sketches and develop
a novel method for sketch based 3D shape retrieval. The proposed
method can successfully learn the representations of sketch
images as well as the similarities, then the 3D shape retrieval
problem can be solved with off-the-shelf nearest neighbor
methods.
After studying the representation learning methods for static
inputs, my focus turns to learning the representations of
sequential data. To be specific, I focus on manipulation actions,
because they are widely used in the daily life and play important
parts in the human-robot collaboration system. Deep neural
networks have been shown to be powerful to represent short video
clips [Donahue et al., 2015]. However, most existing methods
consider the action recognition problem as a classification task.
These methods assume the inputs are pre-segmented videos and the
outputs are category labels. In the scenarios such as the
human-robot collaboration system, the ability to predict the
ongoing human actions at an early stage is highly important. I
first attempt to address this issue with a fast manipulation
action prediction method. Then I build the action prediction
model based on Long Short-Term Memory (LSTM) architecture. The
proposed approach processes the sequential inputs as continuous
signals and keeps updating the prediction of the intended action
based on the learned action representations.
Further, I study the relationships between visual inputs and the
physical information, such as finger forces, that involved in the
manipulation actions. This is motivated by recent studies in
cognitive science which show that the subject’s intention is
strongly related to the hand movements during an action
execution. Human observers can interpret other’s actions in
terms of movements and forces, which can be used to repeat the
observed actions. If a robot system has the ability to estimate
the force feedbacks, it can learn how to manipulate an object by
watching human demonstrations. In this work, the finger forces
are estimated by only watching the movement of hands. A modified
LSTM model is used to regress the finger forces from video
frames. To facilitate this study, a specially designed sensor
glove has been used to collect data of finger forces, and a new
dataset has been collected to provide synchronized streams of
videos and finger forces.
Last, I investigate the usefulness of physical information in
human action recognition, which is an application of bimodal
learning, where both the vision inputs and the additional
information are used to learn the action representation. My study
demonstrates that, by combining additional information with the
vision inputs, the accuracy of human action recognition can be
improved steadily. I extend the LSTM architecture to accept both
video frames and sensor data as bimodal inputs to predict the
action. A hallucination network is jointly trained to approximate
the representations of the additional inputs. During the testing
stage, the hallucination network generates approximated
representations that used for classification. In this way, the
proposed method does not rely on the additional inputs for
testing
- …