Search CORE

3,131 research outputs found

Shared Autonomy via Hindsight Optimization

Author: Bagnell J. Andrew
Javdani Shervin
Srinivasa Siddhartha S.
Publication venue
Publication date: 17/04/2015
Field of study

In shared autonomy, user input and robot autonomy are combined to control a robot to achieve a goal. Often, the robot does not know a priori which goal the user wants to achieve, and must both predict the user's intended goal, and assist in achieving that goal. We formulate the problem of shared autonomy as a Partially Observable Markov Decision Process with uncertainty over the user's goal. We utilize maximum entropy inverse optimal control to estimate a distribution over the user's goal based on the history of inputs. Ideally, the robot assists the user by solving for an action which minimizes the expected cost-to-go for the (unknown) goal. As solving the POMDP to select the optimal action is intractable, we use hindsight optimization to approximate the solution. In a user study, we compare our method to a standard predict-then-blend approach. We find that our method enables users to accomplish tasks more quickly while utilizing less input. However, when asked to rate each system, users were mixed in their assessment, citing a tradeoff between maintaining control authority and accomplishing tasks quickly

arXiv.org e-Print Archive

CiteSeerX

The Complexity of POMDPs with Long-run Average Objectives

Author: Chatterjee Krishnendu
Saona Raimundo
Ziliotto Bruno
Publication venue
Publication date: 30/04/2019
Field of study

We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

General-Purpose Planning Algorithms In Partially-Observable Stochastic Games

Author: McKenney Bryan
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2023
Field of study

Partially observable stochastic games (POSGs) are difficult domains to plan in because they feature multiple agents with potentially opposing goals, parts of the world are hidden from the agents, and some actions have random outcomes. It is infeasible to solve a large POSG optimally. While it may be tempting to design a specialized algorithm for finding suboptimal solutions to a particular POSG, general-purpose planning algorithms can work just as well, but with less complexity and domain knowledge required. I explore this idea in two different POSGs: Navy Defense and Duelyst. In Navy Defense, I show that a specialized algorithm framework, goal-driven autonomy, which requires a complex subsystem separate from the planner for explicitly reasoning about goals, is unnecessary, as simple general planners such as hindsight optimization exhibit implicit goal reasoning and have strong performance. In Duelyst, I show that a specialized expert-rule-based AI can be consistently beaten by a simple general planner using only a small amount of domain knowledge. I also introduce a modification to Monte Carlo tree search that increases performance when rollouts are slow and there are time constraints on planning

UNH Scholars' Repository