19 research outputs found
Gaze Following as Goal Inference: A Bayesian Model
The ability to follow the gaze of another human plays a critical role in cognitive development. Infants as young as 12 months old have been shown to follow the gaze of adults. Recent experimental results indicate that gaze following is not merely an imitation of head movement. We propose that children learn a probabilistic model of the consequences of their movements, and later use this learned model of self as a surrogate for another human. We introduce a Bayesian model where gaze following occurs as a consequence of goal inference in a learned probabilistic graphical model. Bayesian inference over this learned model provides both an estimate of another’s fixation location and the appropriate action to follow their gaze. The model can be regarded as a probabilistic instantiation of Meltzoff’s “Like me ” hypothesis. We present simulation results based on a nonparametric Gaussian process implementation of the model, and compare the model’s performance to infant gaze following results
Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Standard approaches to sequential decision-making exploit an agent's ability
to continually interact with its environment and improve its control policy.
However, due to safety, ethical, and practicality constraints, this type of
trial-and-error experimentation is often infeasible in many real-world domains
such as healthcare and robotics. Instead, control policies in these domains are
typically trained offline from previously logged data or in a growing-batch
manner. In this setting a fixed policy is deployed to the environment and used
to gather an entire batch of new data before being aggregated with past batches
and used to update the policy. This improvement cycle can then be repeated
multiple times. While a limited number of such cycles is feasible in real-world
domains, the quality and diversity of the resulting data are much lower than in
the standard continually-interacting approach. However, data collection in
these domains is often performed in conjunction with human experts, who are
able to label or annotate the collected data. In this paper, we first explore
the trade-offs present in this growing-batch setting, and then investigate how
information provided by a teacher (i.e., demonstrations, expert actions, and
gradient information) can be leveraged at training time to mitigate the sample
complexity and coverage requirements for actor-critic methods. We validate our
contributions on tasks from the DeepMind Control Suite.Comment: Reincarnating Reinforcement Learning Workshop at ICLR 202
Robotic task: Results.
<p>(a) <b>Most likely goals</b>: Initial and final states are at the top of each column. The height of the bar represents the posterior probability of each goal state, with the true goal state marked by an asterisk. (b) <b>Inferring actions</b>: For each initial and desired final state, the plots show the posterior probability of each of the six actions, with the MAP action indicated by an asterisk. (c) <b>Predicting final state</b>: The plots show the posterior probability of reaching the desired final state, given the initial state and the corresponding MAP action shown in (b). The red bar marks 0.5, the threshold below which the robot asks for human help in the Interactive Goal-Based mode.</p
Robotic tabletop organization task setup.
<p>(a) The robot is located on the left side of the work area and the Kinect looks down from the left side from the robot perspective. The three predefined areas that distinguish object states are notated. (b) Toy tabletop objects.</p
Graphical models for robotic goal-based imitation.
<p>(a) through (f) illustrate the use of graphical models for learning state-transitions, action inference, goal inference, goal-based imitation, and state prediction. Shaded nodes denote observed variables.</p