Search CORE

825 research outputs found

Extreme State Aggregation Beyond MDPs

Author: A.L. Strehl
I. Fazekas
M. Hutter
M. Hutter
M.L. Puterman
O.-A. Maillard
P. Nguyen
P. Nguyen
P. Sunehag
R. Givan
R.S. Sutton
S.J. Russell
T. Jaksch
T. Lattimore
T. Lattimore
T. Lattimote
V. Vovk
Publication venue
Publication date: 01/01/2014
Field of study

We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem

arXiv.org e-Print Archive

Crossref

The Australian National University

Metareasoning for Planning Under Uncertainty

Author: Horvitz Eric
Kamar Ece
Kolobov Andrey
Lin Christopher H.
Publication venue
Publication date: 03/05/2015
Field of study

The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most real-world settings. For example, an autonomous drone is subject to nature's forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning---a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.Comment: Extended version of IJCAI 2015 pape

arXiv.org e-Print Archive

CiteSeerX