Search CORE

590 research outputs found

Extreme State Aggregation Beyond MDPs

Author: A.L. Strehl
I. Fazekas
M. Hutter
M. Hutter
M.L. Puterman
O.-A. Maillard
P. Nguyen
P. Nguyen
P. Sunehag
R. Givan
R.S. Sutton
S.J. Russell
T. Jaksch
T. Lattimore
T. Lattimore
T. Lattimote
V. Vovk
Publication venue
Publication date: 01/01/2014
Field of study

We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem

arXiv.org e-Print Archive

Crossref

The Australian National University

Performance Guarantees for Homomorphisms Beyond Markov Decision Processes

Author: Hutter Marcus
Majeed Sultan Javed
Publication venue
Publication date: 09/11/2018
Field of study

Most real-world problems have huge state and/or action spaces. Therefore, a naive application of existing tabular solution methods is not tractable on such problems. Nonetheless, these solution methods are quite useful if an agent has access to a relatively small state-action space homomorphism of the true environment and near-optimal performance is guaranteed by the map. A plethora of research is focused on the case when the homomorphism is a Markovian representation of the underlying process. However, we show that near-optimal performance is sometimes guaranteed even if the homomorphism is non-Markovian. Moreover, we can aggregate significantly more states by lifting the Markovian requirement without compromising on performance. In this work, we expand Extreme State Aggregation (ESA) framework to joint state-action aggregations. We also lift the policy uniformity condition for aggregation in ESA that allows even coarser modeling of the true environment

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Large Markov Decision Processes and Combinatorial Optimization

Author: Eshragh Ali
Publication venue
Publication date: 24/12/2022
Field of study

Markov decision processes continue to gain in popularity for modeling a wide range of applications ranging from analysis of supply chains and queuing networks to cognitive science and control of autonomous vehicles. Nonetheless, they tend to become numerically intractable as the size of the model grows fast. Recent works use machine learning techniques to overcome this crucial issue, but with no convergence guarantee. This note provides a brief overview of literature on solving large Markov decision processes, and exploiting them to solve important combinatorial optimization problems

arXiv.org e-Print Archive

Solving Factored MDPs with Hybrid State and Action Variables

Author: Guestrin C.
Hauskrecht M.
Kveton B.
Publication venue: 'AI Access Foundation'
Publication date: 30/09/2011
Field of study

Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems

arXiv.org e-Print Archive

Crossref

Abstractions of General Reinforcement Learning

Author: Majeed Sultan
Publication venue
Publication date: 01/01/2022
Field of study

The field of artificial intelligence (AI) is devoted to the creation of artificial decision-makers that can perform (at least) on par with the human counterparts on a domain of interest. Unlike the agents in traditional AI, the agents in artificial general intelligence (AGI) are required to replicate human intelligence in almost every domain of interest. Moreover, an AGI agent should be able to achieve this without (virtually any) further changes, retraining, or fine- tuning of the parameters. The real world is non-stationary, non-ergodic, and non-Markovian: we, humans, can neither revisit our past nor are the most recent observations sufficient statistics to perform optimally. Yet, we excel at a variety of complex tasks. Many of these tasks require long term planning. We can associate this success to our natural faculty to abstract away task-irrelevant information from our overwhelming sensory experience. We make task- specific mental models of the world without much effort. Due to this ability to abstract, we can plan on a significantly compact representation of a task without much loss of performance. Not only this, we also abstract our actions to produce high-level plans: the level of action- abstraction can be anywhere between small muscle movements to a mental notion of "doing an action". It is natural to assume that any AGI agent competing with humans (at every plausible domain) should also have these abilities to abstract its experiences and actions. This thesis is an inquiry into the existence of such abstractions which aid efficient planning for a wide range of domains. And most importantly, these abstractions come with some optimality guarantees. We use a history-based reinforcement learning (RL) setup, appropriately called general reinforcement learning (GRL), to model such general-purpose decision-makers. We show that if such GRL agents have access to appropriate abstractions then they can perform optimally in a huge set of domains. That is, we argue that GRL with abstractions, called abstraction reinforcement learning (ARL), is an appropriate framework to model and analyze AGI agents. This work uses and extends beyond a powerful class of (state-only) abstractions called extreme state abstractions (ESA). We analyze a variety of such extreme abstractions, both state-only and state-action abstractions, to formally establish the representation and convergence guarantees. We also make many minor contributions to the ARL framework along the way. Last but not least, we collect a series of ideas that lay the foundations for designing the (extreme) abstraction learning algorithms

The Australian National University