873 research outputs found
Probabilistic movement modeling for intention inference in human-robot interaction.
Intention inference can be an essential step toward efficient humanrobot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows to infer the intention from observed movements using Bayes ’ theorem. The IDDM simultaneously finds a latent state representation of noisy and highdimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
We present a tutorial on Bayesian optimization, a method of finding the
maximum of expensive cost functions. Bayesian optimization employs the Bayesian
technique of setting a prior over the objective function and combining it with
evidence to get a posterior function. This permits a utility-based selection of
the next observation to make on the objective function, which must take into
account both exploration (sampling from areas of high uncertainty) and
exploitation (sampling areas likely to offer improvement over the current best
observation). We also present two detailed extensions of Bayesian optimization,
with experiments---active user modelling with preferences, and hierarchical
reinforcement learning---and a discussion of the pros and cons of Bayesian
optimization based on our experiences
Active Improvement of Control Policies with Bayesian Gaussian Mixture Model
Learning from demonstration (LfD) is an intuitive framework allowing
non-expert users to easily (re-)program robots. However, the quality and
quantity of demonstrations have a great influence on the generalization
performances of LfD approaches. In this paper, we introduce a novel active
learning framework in order to improve the generalization capabilities of
control policies. The proposed approach is based on the epistemic uncertainties
of Bayesian Gaussian mixture models (BGMMs). We determine the new query point
location by optimizing a closed-form information-density cost based on the
quadratic R\'enyi entropy. Furthermore, to better represent uncertain regions
and to avoid local optima problem, we propose to approximate the active
learning cost with a Gaussian mixture model (GMM). We demonstrate our active
learning framework in the context of a reaching task in a cluttered environment
with an illustrative toy example and a real experiment with a Panda robot.Comment: Accepted for publication in IROS'2
Interactive Imitation Learning in Robotics: A Survey
Interactive Imitation Learning (IIL) is a branch of Imitation Learning (IL)
where human feedback is provided intermittently during robot execution allowing
an online improvement of the robot's behavior. In recent years, IIL has
increasingly started to carve out its own space as a promising data-driven
alternative for solving complex robotic tasks. The advantages of IIL are its
data-efficient, as the human feedback guides the robot directly towards an
improved behavior, and its robustness, as the distribution mismatch between the
teacher and learner trajectories is minimized by providing feedback directly
over the learner's trajectories. Nevertheless, despite the opportunities that
IIL presents, its terminology, structure, and applicability are not clear nor
unified in the literature, slowing down its development and, therefore, the
research of innovative formulations and discoveries. In this article, we
attempt to facilitate research in IIL and lower entry barriers for new
practitioners by providing a survey of the field that unifies and structures
it. In addition, we aim to raise awareness of its potential, what has been
accomplished and what are still open research questions. We organize the most
relevant works in IIL in terms of human-robot interaction (i.e., types of
feedback), interfaces (i.e., means of providing feedback), learning (i.e.,
models learned from feedback and function approximators), user experience
(i.e., human perception about the learning process), applications, and
benchmarks. Furthermore, we analyze similarities and differences between IIL
and RL, providing a discussion on how the concepts offline, online, off-policy
and on-policy learning should be transferred to IIL from the RL literature. We
particularly focus on robotic applications in the real world and discuss their
implications, limitations, and promising future areas of research
Machine Learning through Exploration for Perception-Driven Robotics
The ability of robots to perform tasks in human environments has
largely been limited to rather simple and specific tasks, such as lawn mowing
and vacuum cleaning. As such, current robots are far away from the robot butlers, assistants,
and housekeepers that are depicted in science fiction movies. Part of this gap can be
explained by the fact that human environments are hugely varied, complex and unstructured.
For example, the homes that a domestic robot might end up in are hugely varied. Since
every home has a different layout with different objects and furniture, it is impossible for
a human designer to anticipate all challenges a robot might
face, and equip the robot a priori with all the necessary perceptual and manipulation skills.
Instead, robots could be programmed in a way that allows them to adapt to any
environment that they are in. In that case, the robot designer would not
need to precisely anticipate such environments. The ability to adapt can be provided by
robot learning techniques, which can be applied to learn skills for perception and manipulation.
Many of the current
robot learning techniques,
however, rely on human supervisors to provide annotations or demonstrations, and to fine-tuning the methods parameters and heuristics. As such,
it can require a significant amount of human time investment to
make a robot perform a task in a novel environment, even if statistical learning techniques are used.
In this thesis, I focus on another way of obtaining the data a robot needs to
learn about the environment and how to successfully
perform skills in it. By exploring the environment using its own sensors and actuators, rather than
passively waiting for annotations or demonstrations, a
robot can obtain this data by itself. I investigate multiple approaches that allow a robot
to explore its environment autonomously, while trying to minimize the design effort
required to deploy such algorithms in different situations.
First, I consider an unsupervised robot with minimal prior knowledge
about its environment. It can only learn through observed
sensory feedback obtained though interactive exploration of its
environment. In a bottom-up, probabilistic approach, the robot tries to segment
the objects in its environment through clustering with minimal prior knowledge. This clustering is
based on static visual scene features and observed movement. Information theoretic principles are used to autonomously select actions that maximize
the expected information gain, and thus learning speed. Our evaluations
on a real robot system equipped with an on-board camera show that the proposed
method handles noisy inputs better than previous methods, and that
action selection according to the information gain criterion does increase the learning speed.
Often, however, the goal of a robot is not just to learn the structure of the environment, but to learn
how to perform a task encoded by a reward signal.
In addition to the weak feedback provided by reward signals, the robot has access to rich sensory data, that, even for
simple tasks, is often non-linear and high-dimensional. Sensory data can be
leveraged to learn a system model, but in high-dimensional sensory spaces this
step often requires manually designing features. I propose a robot
reinforcement learning algorithm with learned non-parametric models, value
functions, and policies that can deal with high-dimensional state representations.
As such, the proposed algorithm is well-suited to deal with high-dimensional signals
such as camera images. To avoid that the robot converges prematurely to a sub-optimal solution,
the information loss of policy updates is limited. This constraint makes sure the robot keeps exploring the effects
of its behavior on the environment. The experiments show that the proposed non-parametric
relative entropy policy search algorithm performs better than prior methods that either do not employ bounded updates,
or that try to cover the state-space with general-purpose radial basis functions. Furthermore,
the method is validated on a
real-robot setup with high-dimensional camera image inputs.
One problem with typical exploration strategies is that the behavior is perturbed independently
in each time step, for example through selecting a random action or random policy parameters.
As such, the resulting exploration behavior might be incoherent. Incoherence causes
inefficient random walk behavior, makes the system less robust, and causes wear and tear on the robot.
A typical solution is to perturb the policy parameters directly, and use the same perturbation for an entire episode. However, this
strategy
tends to increase the number of episodes needed, since only a single perturbation can be evaluated per episode. I introduce a
strategy that can make a more balanced trade-off between the advantages of these two approaches.
The experiments show that intermediate trade-offs, rather than independent or episode-based exploration,
is beneficial across different tasks and learning algorithms.
This thesis thus addresses how robots can learn autonomously by exploring the world through
unsupervised learning and reinforcement learning. Throughout the thesis, new approaches
and algorithms are introduced: a probabilistic interactive segmentation approach, the non-parametric
relative entropy policy search algorithm, and a framework for generalized exploration.
To allow the learning algorithms to be applied in different and unknown environments,
the design effort and supervision required from human designers or users is minimized.
These approaches and algorithms contribute
towards the capability of robots to autonomously learn useful skills in human environments in a practical manner
Estimation of Phases for Compliant Motion
Nowadays adding a skill to the robot that can interact with the environment is the primary goal of many researchers. The intelligence of the robot can be achieved by segmenting the manipulation task into phases which are subgoals of the task and identifying the transition between them.
This thesis proposes an approach for predicting the number of phases of a compliant motion based manipulation task and estimating their corresponding HMM model that best fit with each segmented phase of the task. Also, it addresses the problem of phase transition monitoring by using recorded data. The captured data is utilized for the building an HMM model, and in the framework of task segmentation, the phase transition addressed. In this thesis, the concept of non-homogeneous HMM is used in modeling the manipulation task, wherein hidden phase depends on observed effect of performing an action (force). The expectation-maximization (EM) algorithm employed in estimating the parameters of the HMM model. The EM algorithm guarantees the estimation of the optimal parameters for each phase of the manipulation task. Hence the modeling accuracy of the forced based transition is significantly enhanced compared to position based transition. To see the performance of the phase transition detection a Viterbi algorithm was implemented. A Cartesian impedance controller defined by [6] for each phase detected is used to reproduce the learned task. The proposed approach is investigated with a KUKA LWR4+ arm in two test setups: in the first, we use parameter estimation for a single demonstration with three phases, and in the second experiment, we find a generalization of the parameter estimation for multiple demonstrations. For both experiments, the transition between phases of the manipulation task is identified.
We conclude that our method provides a convenient platform for modeling and estimating of model parameters for phases of manipulation task from single and double demonstrations
- …