11 research outputs found

    A decision-theoretic model of assistance

    Get PDF
    There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In this paper, we present and evaluate a decision-theoretic framework that captures the general notion of assistance. The objective is to observe a goal-directed agent and to select assistive actions in order to minimize the overall cost. We model the problem as an assistant POMDP where the hidden state corresponds to the agent’s unobserved goals. This formulation allows us to exploit domain models for both estimating the agent’s goals and selecting assistive action. In addition, the formulation naturally handles uncertainty, varying action costs, and customization to specific agents via learning. We argue that in many domains myopic heuristics will be adequate for selecting actions in the assistant POMDP and present two such heuristics. We evaluate our approach in two domains where human subjects perform tasks in game-like computer environments. The results show that the assistant substantially reduces user effort with only a modest computational effort.

    Imitation Learning with Demonstrations and Shaping Rewards

    No full text
    Imitation Learning (IL) is a popular approach for teaching behavior policies to agents by demonstrating the desired target policy. While the approach has lead to many successes, IL often requires a large set of demonstrations to achieve robust learning, which can be expensive for the teacher. In this paper, we consider a novel approach to improve the learning efficiency of IL by providing a shaping reward function in addition to the usual demonstrations. Shaping rewards are numeric functions of states (and possibly actions) that are generally easily specified, and capture general principles of desired behavior, without necessarily completely specifying the behavior. Shaping rewards have been used extensively in reinforcement learning, but have been seldom considered for IL, though they are often easy to specify. Our main contribution is to propose an IL approach that learns from both shaping rewards and demonstrations. We demonstrate the effectiveness of the approach across several IL problems, even when the shaping reward is not fully consistent with the demonstrations

    Reinforcement Learning via Practice and Critique Advice

    No full text
    We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathered during practice and critiques in order to effectively optimize a parametric policy. The approach optimizes a loss function that linearly combines losses measured against the world experience and the critique data. We evaluate our approach using a prototype system for teaching tactical battle behavior in a real-time strategy game engine. Results are given for a significant evaluation involving ten end-users showing the promise of this approach and also highlighting challenges involved in inserting end-users into the RL loop.

    A Decision-Theoretic Model of Assistance- Evaluation, Extensions and Open Problems

    No full text
    There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In our earlier work, we presented a decision-theoretic framework that captures the general notion of assistance. The objective was to observe a goal-directed agent and to select assistive actions in order to minimize the overall cost. We employed the use of POMDPs to model the assistant whose hidden state was the goal of the agent. In this work, we evaluate our model of assistance on a real world domain and establish that our model was very effective in reducing the efforts of the user. We compared the results of our model against a cost-sensitive supervised learning algorithm. We also describe our current work on extending the model to include relational hierarchies. We then analyze some problems in our model and suggest possible extensions to handle them

    Multi-Agent Inverse Reinforcement Learning

    No full text
    Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multiagent inverse reinforcement learning, where reward functions of multiple agents are learned by observing their uncoordinated behavior. A centralized controller then learns to coordinate their behavior by optimizing a weighted sum of reward functions of all the agents. We evaluate our approach on a traffic-routing domain, in which a controller coordinates actions of multiple traffic signals to regulate traffic density. We show that the learner is not only able to match but even significantly outperform the expert. I
    corecore