31,056 research outputs found
Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs
Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in
solving decentralized POMDPs with large horizons. We generalize the algorithm
and improve its scalability by reducing the complexity with respect to the
number of observations from exponential to polynomial. We derive error bounds
on solution quality with respect to this new approximation and analyze the
convergence behavior. To evaluate the effectiveness of the improvements, we
introduce a new, larger benchmark problem. Experimental results show that
despite the high complexity of decentralized POMDPs, scalable solution
techniques such as MBDP perform surprisingly well.Comment: Appears in Proceedings of the Twenty-Third Conference on Uncertainty
in Artificial Intelligence (UAI2007
Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model
In this paper we consider the problem of computing an -optimal
policy of a discounted Markov Decision Process (DMDP) provided we can only
access its transition function through a generative sampling model that given
any state-action pair samples from the transition function in time.
Given such a DMDP with states , actions , discount factor
, and rewards in range we provide an algorithm which
computes an -optimal policy with probability where
\emph{both} the time spent and number of sample taken are upper bounded by For fixed values
of , this improves upon the previous best known bounds by a factor of
and matches the sample complexity lower bounds proved in
Azar et al. (2013) up to logarithmic factors. We also extend our method to
computing -optimal policies for finite-horizon MDP with a generative
model and provide a nearly matching sample complexity lower bound.Comment: 31 pages. Accepted to NeurIPS, 201
- …