Search CORE

11 research outputs found

Recommended from our members

New learning modes for sequential decision making

Author: Judah Kshitij
Publication venue: 'Oregon State University'
Publication date
Field of study

This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy learning algorithms o er teachers little flexibility in how policies are taught. In particular, one of two learning modes is typically considered: 1) Imitation learning, where the teacher demonstrates explicit action sequences to the learner, and 2) Reinforcement learning, where the teacher designs a reward function for the learner to autonomously optimize via practice. This is in sharp contrast to how humans teach other humans, where many other learning modes are commonly used besides imitation and practice. This thesis presents novel learning modes for teaching policies to computer agents, with the eventual aim of allowing human teachers to teach computer agents more naturally and efficiently. Our first learning mode is inspired by how humans learn: through rounds of practice followed by feedback from a teacher. We adopt this mode to create computer agents that learn from several rounds of autonomous practice followed by critique feedback from a teacher. Our results show that this mode of policy learning is more e effective than pure reinforcement learning, though important usability issues arise when used with human teachers. Next we consider a learning mode where the computer agent can actively ask questions to the teacher, which we call active imitation learning. We provide algorithms for active imitation learning that are proven to require strictly less interaction with the teacher than passive imitation learning. We also show that empirically active imitation learning algorithms are much more efficient than traditional passive imitation learning in terms of amount of interaction with the teacher. Lastly, we introduce a novel imitation learning mode that allows a teacher to specify shaping rewards to a computer agent in addition to demonstrations. Shaping rewards are additional rewards supplied to an agent for accelerating policy learning via reinforcement learning. We provide an algorithm to incorporate shaping rewards in imitation learning and show that it learns from fewer demonstrations than pure imitation learning. We wrap up by presenting a prototype User-Initiated Learning (UIL) system that allows an end user to demonstrate procedures containing optional steps and instruct the system to autonomously learn to predict when the optional steps should be executed, and remind the user if they forget. Our prototype supports user-initiated demonstration and learning via a natural interface, and has a built-in automated machine learning engine to automatically train and install a predictor for the requested prediction problem

ScholarsArchive@OSU

A decision-theoretic model of assistance

Author: Alan Fern
Kshitij Judah
Prasad Tadepalli
Sriraam Natarajan
Publication venue
Publication date: 01/01/2007
Field of study

There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In this paper, we present and evaluate a decision-theoretic framework that captures the general notion of assistance. The objective is to observe a goal-directed agent and to select assistive actions in order to minimize the overall cost. We model the problem as an assistant POMDP where the hidden state corresponds to the agent’s unobserved goals. This formulation allows us to exploit domain models for both estimating the agent’s goals and selecting assistive action. In addition, the formulation naturally handles uncertainty, varying action costs, and customization to specific agents via learning. We argue that in many domains myopic heuristics will be adequate for selecting actions in the assistant POMDP and present two such heuristics. We evaluate our approach in two domains where human subjects perform tasks in game-like computer environments. The results show that the assistant substantially reduces user effort with only a modest computational effort.

CiteSeerX

Imitation Learning with Demonstrations and Shaping Rewards

Author: Fern Alan
Goetschalckx Robby
Judah Kshitij
Tadepalli Prasad
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 21/06/2014
Field of study

Imitation Learning (IL) is a popular approach for teaching behavior policies to agents by demonstrating the desired target policy. While the approach has lead to many successes, IL often requires a large set of demonstrations to achieve robust learning, which can be expensive for the teacher. In this paper, we consider a novel approach to improve the learning efficiency of IL by providing a shaping reward function in addition to the usual demonstrations. Shaping rewards are numeric functions of states (and possibly actions) that are generally easily specified, and capture general principles of desired behavior, without necessarily completely specifying the behavior. Shaping rewards have been used extensively in reinforcement learning, but have been seldom considered for IL, though they are often easy to specify. Our main contribution is to propose an IL approach that learns from both shaping rewards and demonstrations. We demonstrate the effectiveness of the approach across several IL problems, even when the shaping reward is not fully consistent with the demonstrations

Association for the Advancement of Artificial Intelligence: AAAI Publications

Reinforcement Learning via Practice and Critique Advice

Author: Alan Fern
Kshitij Judah
Saikat Roy
Thomas G. Dietterich
Publication venue
Publication date: 01/01/2010
Field of study

We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathered during practice and critiques in order to effectively optimize a parametric policy. The approach optimizes a loss function that linearly combines losses measured against the world experience and the critique data. We evaluate our approach using a prototype system for teaching tactical battle behavior in a real-time strategy game engine. Results are given for a significant evaluation involving ten end-users showing the promise of this approach and also highlighting challenges involved in inserting end-users into the RL loop.

CiteSeerX

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Decision-Theoretic Model of Assistance- Evaluation, Extensions and Open Problems

Author: Alan Fern
Kshitij Judah
Prasad Tadepalli
Sriraam Natarajan
Publication venue
Publication date
Field of study

There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In our earlier work, we presented a decision-theoretic framework that captures the general notion of assistance. The objective was to observe a goal-directed agent and to select assistive actions in order to minimize the overall cost. We employed the use of POMDPs to model the assistant whose hidden state was the goal of the agent. In this work, we evaluate our model of assistance on a real world domain and establish that our model was very effective in reducing the efforts of the user. We compared the results of our model against a cost-sensitive supervised learning algorithm. We also describe our current work on extending the model to include relational hierarchies. We then analyze some problems in our model and suggest possible extensions to handle them

CiteSeerX

Multi-Agent Inverse Reinforcement Learning

Author: Gautam Kunapuli
Jude Shavlik
Kristian Kersting
Kshitij Judah
Prasad Tadepalli
Sriraam Natarajan
Publication venue
Publication date: 23/11/2010
Field of study

Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multiagent inverse reinforcement learning, where reward functions of multiple agents are learned by observing their uncoordinated behavior. A centralized controller then learns to coordinate their behavior by optimizing a weighted sum of reward functions of all the agents. We evaluate our approach on a traffic-routing domain, in which a controller coordinates actions of multiple traffic signals to regulate traffic density. We show that the learner is not only able to match but even significantly outperform the expert. I

CiteSeerX

Crossref