590 research outputs found
Extreme State Aggregation Beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with an
environment in observation-reward-action cycles without any (esp.\ MDP)
assumptions on the environment. State aggregation and more generally feature
reinforcement learning is concerned with mapping histories/raw-states to
reduced/aggregated states. The idea behind both is that the resulting reduced
process (approximately) forms a small stationary finite-state MDP, which can
then be efficiently solved or learnt. We considerably generalize existing
aggregation results by showing that even if the reduced process is not an MDP,
the (q-)value functions and (optimal) policies of an associated MDP with same
state-space size solve the original problem, as long as the solution can
approximately be represented as a function of the reduced states. This implies
an upper bound on the required state space size that holds uniformly for all RL
problems. It may also explain why RL algorithms designed for MDPs sometimes
perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem
Performance Guarantees for Homomorphisms Beyond Markov Decision Processes
Most real-world problems have huge state and/or action spaces. Therefore, a
naive application of existing tabular solution methods is not tractable on such
problems. Nonetheless, these solution methods are quite useful if an agent has
access to a relatively small state-action space homomorphism of the true
environment and near-optimal performance is guaranteed by the map. A plethora
of research is focused on the case when the homomorphism is a Markovian
representation of the underlying process. However, we show that near-optimal
performance is sometimes guaranteed even if the homomorphism is non-Markovian.
Moreover, we can aggregate significantly more states by lifting the Markovian
requirement without compromising on performance. In this work, we expand
Extreme State Aggregation (ESA) framework to joint state-action aggregations.
We also lift the policy uniformity condition for aggregation in ESA that allows
even coarser modeling of the true environment
Large Markov Decision Processes and Combinatorial Optimization
Markov decision processes continue to gain in popularity for modeling a wide
range of applications ranging from analysis of supply chains and queuing
networks to cognitive science and control of autonomous vehicles. Nonetheless,
they tend to become numerically intractable as the size of the model grows
fast. Recent works use machine learning techniques to overcome this crucial
issue, but with no convergence guarantee. This note provides a brief overview
of literature on solving large Markov decision processes, and exploiting them
to solve important combinatorial optimization problems
Solving Factored MDPs with Hybrid State and Action Variables
Efficient representations and solutions for large decision problems with
continuous and discrete variables are among the most important challenges faced
by the designers of automated decision support systems. In this paper, we
describe a novel hybrid factored Markov decision process (MDP) model that
allows for a compact representation of these problems, and a new hybrid
approximate linear programming (HALP) framework that permits their efficient
solutions. The central idea of HALP is to approximate the optimal value
function by a linear combination of basis functions and optimize its weights by
linear programming. We analyze both theoretical and computational aspects of
this approach, and demonstrate its scale-up potential on several hybrid
optimization problems
Abstractions of General Reinforcement Learning
The field of artificial intelligence (AI) is devoted to the creation of artificial decision-makers that can perform (at least) on par with the human counterparts on a domain of interest. Unlike the agents in traditional AI, the agents in artificial general intelligence (AGI) are required to replicate human intelligence in almost every domain of interest. Moreover, an AGI agent should be able to achieve this without (virtually any) further changes, retraining, or fine- tuning of the parameters. The real world is non-stationary, non-ergodic, and non-Markovian: we, humans, can neither revisit our past nor are the most recent observations sufficient statistics to perform optimally. Yet, we excel at a variety of complex tasks. Many of these tasks require long term planning. We can associate this success to our natural faculty to abstract away task-irrelevant information from our overwhelming sensory experience. We make task- specific mental models of the world without much effort. Due to this ability to abstract, we can plan on a significantly compact representation of a task without much loss of performance. Not only this, we also abstract our actions to produce high-level plans: the level of action- abstraction can be anywhere between small muscle movements to a mental notion of "doing an action". It is natural to assume that any AGI agent competing with humans (at every plausible domain) should also have these abilities to abstract its experiences and actions. This thesis is an inquiry into the existence of such abstractions which aid efficient planning for a wide range of domains. And most importantly, these abstractions come with some optimality guarantees. We use a history-based reinforcement learning (RL) setup, appropriately called general reinforcement learning (GRL), to model such general-purpose decision-makers. We show that if such GRL agents have access to appropriate abstractions then they can perform optimally in a huge set of domains. That is, we argue that GRL with abstractions, called abstraction reinforcement learning (ARL), is an appropriate framework to model and analyze AGI agents. This work uses and extends beyond a powerful class of (state-only) abstractions called extreme state abstractions (ESA). We analyze a variety of such extreme abstractions, both state-only and state-action abstractions, to formally establish the representation and convergence guarantees. We also make many minor contributions to the ARL framework along the way. Last but not least, we collect a series of ideas that lay the foundations for designing the (extreme) abstraction learning algorithms
- …