7,967 research outputs found
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
A critical flaw of existing inverse reinforcement learning (IRL) methods is
their inability to significantly outperform the demonstrator. This is because
IRL typically seeks a reward function that makes the demonstrator appear
near-optimal, rather than inferring the underlying intentions of the
demonstrator that may have been poorly executed in practice. In this paper, we
introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked
Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately)
ranked demonstrations in order to infer high-quality reward functions from a
set of potentially poor demonstrations. When combined with deep reinforcement
learning, T-REX outperforms state-of-the-art imitation learning and IRL methods
on multiple Atari and MuJoCo benchmark tasks and achieves performance that is
often more than twice the performance of the best demonstration. We also
demonstrate that T-REX is robust to ranking noise and can accurately
extrapolate intention by simply watching a learner noisily improve at a task
over time.Comment: In proceedings of Thirty-sixth International Conference on Machine
Learning (ICML 2019
Few-Shot Goal Inference for Visuomotor Learning and Planning
Reinforcement learning and planning methods require an objective or reward
function that encodes the desired behavior. Yet, in practice, there is a wide
range of scenarios where an objective is difficult to provide programmatically,
such as tasks with visual observations involving unknown object positions or
deformable objects. In these cases, prior methods use engineered
problem-specific solutions, e.g., by instrumenting the environment with
additional sensors to measure a proxy for the objective. Such solutions require
a significant engineering effort on a per-task basis, and make it impractical
for robots to continuously learn complex skills outside of laboratory settings.
We aim to find a more general and scalable solution for specifying goals for
robot learning in unconstrained environments. To that end, we formulate the
few-shot objective learning problem, where the goal is to learn a task
objective from only a few example images of successful end states for that
task. We propose a simple solution to this problem: meta-learn a classifier
that can recognize new goals from a few examples. We show how this approach can
be used with both model-free reinforcement learning and visual model-based
planning and show results in three domains: rope manipulation from images in
simulation, visual navigation in a simulated 3D environment, and object
arrangement into user-specified configurations on a real robot.Comment: Videos available at https://sites.google.com/view/few-shot-goal
Unsupervised adaptation of brain machine interface decoders
The performance of neural decoders can degrade over time due to
nonstationarities in the relationship between neuronal activity and behavior.
In this case, brain-machine interfaces (BMI) require adaptation of their
decoders to maintain high performance across time. One way to achieve this is
by use of periodical calibration phases, during which the BMI system (or an
external human demonstrator) instructs the user to perform certain movements or
behaviors. This approach has two disadvantages: (i) calibration phases
interrupt the autonomous operation of the BMI and (ii) between two calibration
phases the BMI performance might not be stable but continuously decrease. A
better alternative would be that the BMI decoder is able to continuously adapt
in an unsupervised manner during autonomous BMI operation, i.e. without knowing
the movement intentions of the user.
In the present article, we present an efficient method for such unsupervised
training of BMI systems for continuous movement control. The proposed method
utilizes a cost function derived from neuronal recordings, which guides a
learning algorithm to evaluate the decoding parameters. We verify the
performance of our adaptive method by simulating a BMI user with an optimal
feedback control model and its interaction with our adaptive BMI decoder. The
simulation results show that the cost function and the algorithm yield fast and
precise trajectories towards targets at random orientations on a 2-dimensional
computer screen. For initially unknown and non-stationary tuning parameters,
our unsupervised method is still able to generate precise trajectories and to
keep its performance stable in the long term. The algorithm can optionally work
also with neuronal error signals instead or in conjunction with the proposed
unsupervised adaptation.Comment: 28 pages, 13 figures, submitted to Frontiers in Neuroprosthetic
One-Shot Imitation Learning
Imitation learning has been commonly applied to solve different tasks in
isolation. This usually requires either careful feature engineering, or a
significant number of samples. This is far from what we desire: ideally, robots
should be able to learn from very few demonstrations of any given task, and
instantly generalize to new situations of the same task, without requiring
task-specific engineering. In this paper, we propose a meta-learning framework
for achieving such capability, which we call one-shot imitation learning.
Specifically, we consider the setting where there is a very large set of
tasks, and each task has many instantiations. For example, a task could be to
stack all blocks on a table into a single tower, another task could be to place
all blocks on a table into two-block towers, etc. In each case, different
instances of the task would consist of different sets of blocks with different
initial states. At training time, our algorithm is presented with pairs of
demonstrations for a subset of all tasks. A neural net is trained that takes as
input one demonstration and the current state (which initially is the initial
state of the other demonstration of the pair), and outputs an action with the
goal that the resulting sequence of states and actions matches as closely as
possible with the second demonstration. At test time, a demonstration of a
single instance of a new task is presented, and the neural net is expected to
perform well on new instances of this new task. The use of soft attention
allows the model to generalize to conditions and tasks unseen in the training
data. We anticipate that by training this model on a much greater variety of
tasks and settings, we will obtain a general system that can turn any
demonstrations into robust policies that can accomplish an overwhelming variety
of tasks.
Videos available at https://bit.ly/nips2017-oneshot
Meta-learners' learning dynamics are unlike learners'
Meta-learning is a tool that allows us to build sample-efficient learning
systems. Here we show that, once meta-trained, LSTM Meta-Learners aren't just
faster learners than their sample-inefficient deep learning (DL) and
reinforcement learning (RL) brethren, but that they actually pursue
fundamentally different learning trajectories. We study their learning dynamics
on three sets of structured tasks for which the corresponding learning dynamics
of DL and RL systems have been previously described: linear regression (Saxe et
al., 2013), nonlinear regression (Rahaman et al., 2018; Xu et al., 2018), and
contextual bandits (Schaul et al., 2019). In each case, while
sample-inefficient DL and RL Learners uncover the task structure in a staggered
manner, meta-trained LSTM Meta-Learners uncover almost all task structure
concurrently, congruent with the patterns expected from Bayes-optimal inference
algorithms. This has implications for research areas wherever the learning
behaviour itself is of interest, such as safety, curriculum design, and
human-in-the-loop machine learning.Comment: 26 pages, 23 figure
Theory of mind and decision science: Towards a typology of tasks and computational models
The ability to form a Theory of Mind (ToM), i.e., to theorize about others’ mental states to explain and predict behavior in relation to attributed intentional states, constitutes a hallmark of human cognition. These abilities are multi-faceted and include a variety of different cognitive sub-functions. Here, we focus on decision processes in social contexts and review a number of experimental and computational modeling approaches in this field. We provide an overview of experimental accounts and formal computational models with respect to two dimensions: interactivity and uncertainty. Thereby, we aim at capturing the nuances of ToM functions in the context of social decision processes. We suggest there to be an increase in ToM engagement and multiplexing as social cognitive decision-making tasks become more interactive and uncertain. We propose that representing others as intentional and goal directed agents who perform consequential actions is elicited only at the edges of these two dimensions. Further, we argue that computational models of valuation and beliefs follow these dimensions to best allow researchers to effectively model sophisticated ToM-processes. Finally, we relate this typology to neuroimaging findings in neurotypical (NT) humans, studies of persons with autism spectrum (AS), and studies of nonhuman primates
Multi-task Maximum Entropy Inverse Reinforcement Learning
Multi-task Inverse Reinforcement Learning (IRL) is the problem of inferring
multiple reward functions from expert demonstrations. Prior work, built on
Bayesian IRL, is unable to scale to complex environments due to computational
constraints. This paper contributes a formulation of multi-task IRL in the more
computationally efficient Maximum Causal Entropy (MCE) IRL framework.
Experiments show our approach can perform one-shot imitation learning in a
gridworld environment that single-task IRL algorithms need hundreds of
demonstrations to solve. We outline preliminary work using meta-learning to
extend our method to the function approximator setting of modern MCE IRL
algorithms. Evaluating on multi-task variants of common simulated robotics
benchmarks, we discover serious limitations of these IRL algorithms, and
conclude with suggestions for further work.Comment: Presented at 1st Workshop on Goal Specifications for Reinforcement
Learning (ICML/IJCAI/AAMAS 2018
Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
Providing a suitable reward function to reinforcement learning can be
difficult in many real world applications. While inverse reinforcement learning
(IRL) holds promise for automatically learning reward functions from
demonstrations, several major challenges remain. First, existing IRL methods
learn reward functions from scratch, requiring large numbers of demonstrations
to correctly infer the reward for each task the agent may need to perform.
Second, existing methods typically assume homogeneous demonstrations for a
single behavior or task, while in practice, it might be easier to collect
datasets of heterogeneous but related behaviors. To this end, we propose a deep
latent variable model that is capable of learning rewards from demonstrations
of distinct but related tasks in an unsupervised way. Critically, our model can
infer rewards for new, structurally-similar tasks from a single demonstration.
Our experiments on multiple continuous control tasks demonstrate the
effectiveness of our approach compared to state-of-the-art imitation and
inverse reinforcement learning methods.Comment: NeurIPS 201
Getting to Know One Another: Calibrating Intent, Capabilities and Trust for Human-Robot Collaboration
Common experience suggests that agents who know each other well are better
able to work together. In this work, we address the problem of calibrating
intention and capabilities in human-robot collaboration. In particular, we
focus on scenarios where the robot is attempting to assist a human who is
unable to directly communicate her intent. Moreover, both agents may have
differing capabilities that are unknown to one another. We adopt a
decision-theoretic approach and propose the TICC-POMDP for modeling this
setting, with an associated online solver. Experiments show our approach leads
to better team performance both in simulation and in a real-world study with
human subjects.Comment: IROS 202
Meta-Adversarial Inverse Reinforcement Learning for Decision-making Tasks
Learning from demonstrations has made great progress over the past few years.
However, it is generally data hungry and task specific. In other words, it
requires a large amount of data to train a decent model on a particular task,
and the model often fails to generalize to new tasks that have a different
distribution. In practice, demonstrations from new tasks will be continuously
observed and the data might be unlabeled or only partially labeled. Therefore,
it is desirable for the trained model to adapt to new tasks that have limited
data samples available. In this work, we build an adaptable imitation learning
model based on the integration of Meta-learning and Adversarial Inverse
Reinforcement Learning (Meta-AIRL). We exploit the adversarial learning and
inverse reinforcement learning mechanisms to learn policies and reward
functions simultaneously from available training tasks and then adapt them to
new tasks with the meta-learning framework. Simulation results show that the
adapted policy trained with Meta-AIRL can effectively learn from limited number
of demonstrations, and quickly reach the performance comparable to that of the
experts on unseen tasks.Comment: 2021 International Conference on Robotics and Automation (ICRA 2021
- …