2,367 research outputs found

    Bayesian multitask inverse reinforcement learning

    Get PDF
    We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure

    Probabilistic inverse reinforcement learning in unknown environments

    Full text link
    We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Deep reinforcement learning from human preferences

    Full text link
    For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback

    Enabling Environment Design via Active Indirect Elicitation

    Get PDF
    Many situations arise in which an interested party wishes to affect the decisions of an agent; e.g., a teacher that seeks to promote particular study habits, a Web 2.0 site that seeks to encourage users to contribute content, or an online retailer that seeks to encourage consumers to write reviews. In the problem of environment design, one assumes an interested party who is able to alter limited aspects of the environment for the purpose of promoting desirable behaviors. A critical aspect of environment design is understanding preferences, but by assumption direct queries are unavailable. We work in the inverse reinforcement learning framework, adopting here the idea of active indirect preference elicitation to learn the reward function of the agent by observing behavior in response to incentives. We show that the process is convergent and obtain desirable bounds on the number of elicitation rounds. We briefly discuss generalizations of the elicitation method to other forms of environment design, e.g., modifying the state space, transition model, and available actions.Engineering and Applied Science

    "So, Tell Me What Users Want, What They Really, Really Want!"

    Full text link
    Equating users' true needs and desires with behavioural measures of 'engagement' is problematic. However, good metrics of 'true preferences' are difficult to define, as cognitive biases make people's preferences change with context and exhibit inconsistencies over time. Yet, HCI research often glosses over the philosophical and theoretical depth of what it means to infer what users really want. In this paper, we present an alternative yet very real discussion of this issue, via a fictive dialogue between senior executives in a tech company aimed at helping people live the life they `really' want to live. How will the designers settle on a metric for their product to optimise
    • …
    corecore