    Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs

    Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well.Comment: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007

    Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

    In this paper we consider the problem of computing an ϵ\epsilon-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in O(1)O(1) time. Given such a DMDP with states SS, actions AA, discount factor γ(0,1)\gamma\in(0,1), and rewards in range [0,1][0, 1] we provide an algorithm which computes an ϵ\epsilon-optimal policy with probability 1δ1 - \delta where \emph{both} the time spent and number of sample taken are upper bounded by O[SA(1γ)3ϵ2log(SA(1γ)δϵ)log(1(1γ)ϵ)] . O\left[\frac{|S||A|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|S||A|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. For fixed values of ϵ\epsilon, this improves upon the previous best known bounds by a factor of (1γ)1(1 - \gamma)^{-1} and matches the sample complexity lower bounds proved in Azar et al. (2013) up to logarithmic factors. We also extend our method to computing ϵ\epsilon-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.Comment: 31 pages. Accepted to NeurIPS, 201