26,514 research outputs found
Solving Factored MDPs with Hybrid State and Action Variables
Efficient representations and solutions for large decision problems with
continuous and discrete variables are among the most important challenges faced
by the designers of automated decision support systems. In this paper, we
describe a novel hybrid factored Markov decision process (MDP) model that
allows for a compact representation of these problems, and a new hybrid
approximate linear programming (HALP) framework that permits their efficient
solutions. The central idea of HALP is to approximate the optimal value
function by a linear combination of basis functions and optimize its weights by
linear programming. We analyze both theoretical and computational aspects of
this approach, and demonstrate its scale-up potential on several hybrid
optimization problems
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
Learning to Control in Metric Space with Optimal Regret
We study online reinforcement learning for finite-horizon deterministic
control systems with {\it arbitrary} state and action spaces. Suppose that the
transition dynamics and reward function is unknown, but the state and action
space is endowed with a metric that characterizes the proximity between
different states and actions. We provide a surprisingly simple upper-confidence
reinforcement learning algorithm that uses a function approximation oracle to
estimate optimistic Q functions from experiences. We show that the regret of
the algorithm after episodes is where is a
smoothness parameter, and is the doubling dimension of the state-action
space with respect to the given metric. We also establish a near-matching
regret lower bound. The proposed method can be adapted to work for more
structured transition systems, including the finite-state case and the case
where value functions are linear combinations of features, where the method
also achieve the optimal regret
Nonlinear Markov Processes in Big Networks
Big networks express various large-scale networks in many practical areas
such as computer networks, internet of things, cloud computation, manufacturing
systems, transportation networks, and healthcare systems. This paper analyzes
such big networks, and applies the mean-field theory and the nonlinear Markov
processes to set up a broad class of nonlinear continuous-time block-structured
Markov processes, which can be applied to deal with many practical stochastic
systems. Firstly, a nonlinear Markov process is derived from a large number of
interacting big networks with symmetric interactions, each of which is
described as a continuous-time block-structured Markov process. Secondly, some
effective algorithms are given for computing the fixed points of the nonlinear
Markov process by means of the UL-type RG-factorization. Finally, the Birkhoff
center, the Lyapunov functions and the relative entropy are used to analyze
stability or metastability of the big network, and several interesting open
problems are proposed with detailed interpretation. We believe that the results
given in this paper can be useful and effective in the study of big networks.Comment: 28 pages in Special Matrices; 201
- …