26,514 research outputs found

    Solving Factored MDPs with Hybrid State and Action Variables

    Full text link
    Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems

    Perseus: Randomized Point-based Value Iteration for POMDPs

    Full text link
    Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

    Learning to Control in Metric Space with Optimal Regret

    Full text link
    We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after KK episodes is O(HL(KH)d−1d)O(HL(KH)^{\frac{d-1}{d}}) where LL is a smoothness parameter, and dd is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret

    Nonlinear Markov Processes in Big Networks

    Full text link
    Big networks express various large-scale networks in many practical areas such as computer networks, internet of things, cloud computation, manufacturing systems, transportation networks, and healthcare systems. This paper analyzes such big networks, and applies the mean-field theory and the nonlinear Markov processes to set up a broad class of nonlinear continuous-time block-structured Markov processes, which can be applied to deal with many practical stochastic systems. Firstly, a nonlinear Markov process is derived from a large number of interacting big networks with symmetric interactions, each of which is described as a continuous-time block-structured Markov process. Secondly, some effective algorithms are given for computing the fixed points of the nonlinear Markov process by means of the UL-type RG-factorization. Finally, the Birkhoff center, the Lyapunov functions and the relative entropy are used to analyze stability or metastability of the big network, and several interesting open problems are proposed with detailed interpretation. We believe that the results given in this paper can be useful and effective in the study of big networks.Comment: 28 pages in Special Matrices; 201
    • …
    corecore