2,367 research outputs found
Bayesian multitask inverse reinforcement learning
We generalise the problem of inverse reinforcement learning to multiple
tasks, from multiple demonstrations. Each one may represent one expert trying
to solve a different task, or as different experts trying to solve the same
task. Our main contribution is to formalise the problem as statistical
preference elicitation, via a number of structured priors, whose form captures
our biases about the relatedness of different tasks or expert policies. In
doing so, we introduce a prior on policy optimality, which is more natural to
specify. We show that our framework allows us not only to learn to efficiently
from multiple experts but to also effectively differentiate between the goals
of each. Possible applications include analysing the intrinsic motivations of
subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Deep reinforcement learning from human preferences
For sophisticated reinforcement learning (RL) systems to interact usefully
with real-world environments, we need to communicate complex goals to these
systems. In this work, we explore goals defined in terms of (non-expert) human
preferences between pairs of trajectory segments. We show that this approach
can effectively solve complex RL tasks without access to the reward function,
including Atari games and simulated robot locomotion, while providing feedback
on less than one percent of our agent's interactions with the environment. This
reduces the cost of human oversight far enough that it can be practically
applied to state-of-the-art RL systems. To demonstrate the flexibility of our
approach, we show that we can successfully train complex novel behaviors with
about an hour of human time. These behaviors and environments are considerably
more complex than any that have been previously learned from human feedback
Enabling Environment Design via Active Indirect Elicitation
Many situations arise in which an interested party wishes to
affect the decisions of an agent; e.g., a teacher that seeks to
promote particular study habits, a Web 2.0 site that seeks to
encourage users to contribute content, or an online retailer
that seeks to encourage consumers to write reviews. In the
problem of environment design, one assumes an interested
party who is able to alter limited aspects of the environment
for the purpose of promoting desirable behaviors. A critical
aspect of environment design is understanding preferences,
but by assumption direct queries are unavailable. We work in
the inverse reinforcement learning framework, adopting here
the idea of active indirect preference elicitation to learn the reward function of the agent by observing behavior in response
to incentives. We show that the process is convergent and
obtain desirable bounds on the number of elicitation rounds.
We briefly discuss generalizations of the elicitation method to
other forms of environment design, e.g., modifying the state
space, transition model, and available actions.Engineering and Applied Science
"So, Tell Me What Users Want, What They Really, Really Want!"
Equating users' true needs and desires with behavioural measures of
'engagement' is problematic. However, good metrics of 'true preferences' are
difficult to define, as cognitive biases make people's preferences change with
context and exhibit inconsistencies over time. Yet, HCI research often glosses
over the philosophical and theoretical depth of what it means to infer what
users really want. In this paper, we present an alternative yet very real
discussion of this issue, via a fictive dialogue between senior executives in a
tech company aimed at helping people live the life they `really' want to live.
How will the designers settle on a metric for their product to optimise
- …