12,922 research outputs found
Extreme State Aggregation Beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with an
environment in observation-reward-action cycles without any (esp.\ MDP)
assumptions on the environment. State aggregation and more generally feature
reinforcement learning is concerned with mapping histories/raw-states to
reduced/aggregated states. The idea behind both is that the resulting reduced
process (approximately) forms a small stationary finite-state MDP, which can
then be efficiently solved or learnt. We considerably generalize existing
aggregation results by showing that even if the reduced process is not an MDP,
the (q-)value functions and (optimal) policies of an associated MDP with same
state-space size solve the original problem, as long as the solution can
approximately be represented as a function of the reduced states. This implies
an upper bound on the required state space size that holds uniformly for all RL
problems. It may also explain why RL algorithms designed for MDPs sometimes
perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem
Performance Guarantees for Homomorphisms Beyond Markov Decision Processes
Most real-world problems have huge state and/or action spaces. Therefore, a
naive application of existing tabular solution methods is not tractable on such
problems. Nonetheless, these solution methods are quite useful if an agent has
access to a relatively small state-action space homomorphism of the true
environment and near-optimal performance is guaranteed by the map. A plethora
of research is focused on the case when the homomorphism is a Markovian
representation of the underlying process. However, we show that near-optimal
performance is sometimes guaranteed even if the homomorphism is non-Markovian.
Moreover, we can aggregate significantly more states by lifting the Markovian
requirement without compromising on performance. In this work, we expand
Extreme State Aggregation (ESA) framework to joint state-action aggregations.
We also lift the policy uniformity condition for aggregation in ESA that allows
even coarser modeling of the true environment
Reduction of Markov Chains using a Value-of-Information-Based Approach
In this paper, we propose an approach to obtain reduced-order models of
Markov chains. Our approach is composed of two information-theoretic processes.
The first is a means of comparing pairs of stationary chains on different state
spaces, which is done via the negative Kullback-Leibler divergence defined on a
model joint space. Model reduction is achieved by solving a
value-of-information criterion with respect to this divergence. Optimizing the
criterion leads to a probabilistic partitioning of the states in the high-order
Markov chain. A single free parameter that emerges through the optimization
process dictates both the partition uncertainty and the number of state groups.
We provide a data-driven means of choosing the `optimal' value of this free
parameter, which sidesteps needing to a priori know the number of state groups
in an arbitrary chain.Comment: Submitted to Entrop
Ergodic Control and Polyhedral approaches to PageRank Optimization
We study a general class of PageRank optimization problems which consist in
finding an optimal outlink strategy for a web site subject to design
constraints. We consider both a continuous problem, in which one can choose the
intensity of a link, and a discrete one, in which in each page, there are
obligatory links, facultative links and forbidden links. We show that the
continuous problem, as well as its discrete variant when there are no
constraints coupling different pages, can both be modeled by constrained Markov
decision processes with ergodic reward, in which the webmaster determines the
transition probabilities of websurfers. Although the number of actions turns
out to be exponential, we show that an associated polytope of transition
measures has a concise representation, from which we deduce that the continuous
problem is solvable in polynomial time, and that the same is true for the
discrete problem when there are no coupling constraints. We also provide
efficient algorithms, adapted to very large networks. Then, we investigate the
qualitative features of optimal outlink strategies, and identify in particular
assumptions under which there exists a "master" page to which all controlled
pages should point. We report numerical results on fragments of the real web
graph.Comment: 39 page
Towards Swarm Calculus: Urn Models of Collective Decisions and Universal Properties of Swarm Performance
Methods of general applicability are searched for in swarm intelligence with
the aim of gaining new insights about natural swarms and to develop design
methodologies for artificial swarms. An ideal solution could be a `swarm
calculus' that allows to calculate key features of swarms such as expected
swarm performance and robustness based on only a few parameters. To work
towards this ideal, one needs to find methods and models with high degrees of
generality. In this paper, we report two models that might be examples of
exceptional generality. First, an abstract model is presented that describes
swarm performance depending on swarm density based on the dichotomy between
cooperation and interference. Typical swarm experiments are given as examples
to show how the model fits to several different results. Second, we give an
abstract model of collective decision making that is inspired by urn models.
The effects of positive feedback probability, that is increasing over time in a
decision making system, are understood by the help of a parameter that controls
the feedback based on the swarm's current consensus. Several applicable
methods, such as the description as Markov process, calculation of splitting
probabilities, mean first passage times, and measurements of positive feedback,
are discussed and applications to artificial and natural swarms are reported
The speed of range shifts in fragmented landscapes
Peer reviewedPublisher PD
Large Markov Decision Processes and Combinatorial Optimization
Markov decision processes continue to gain in popularity for modeling a wide
range of applications ranging from analysis of supply chains and queuing
networks to cognitive science and control of autonomous vehicles. Nonetheless,
they tend to become numerically intractable as the size of the model grows
fast. Recent works use machine learning techniques to overcome this crucial
issue, but with no convergence guarantee. This note provides a brief overview
of literature on solving large Markov decision processes, and exploiting them
to solve important combinatorial optimization problems
- …