15 research outputs found

    Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes

    Full text link
    Policy iteration enjoys a local quadratic rate of contraction, but its iterations are computationally expensive for Markov decision processes (MDPs) with a large number of states. In light of the connection between policy iteration and the semismooth Newton method and taking inspiration from the inexact variants of the latter, we propose \textit{inexact policy iteration}, a new class of methods for large-scale finite MDPs with local contraction guarantees. We then design an instance based on the deployment of GMRES for the approximate policy evaluation step, which we call inexact GMRES policy iteration. Finally, we demonstrate the superior practical performance of inexact GMRES policy iteration on an MDP with 10000 states, where it achieves a ×5.8\times 5.8 and ×2.2\times 2.2 speedup with respect to policy iteration and optimistic policy iteration, respectively

    Policy Iteration for Multiplicative Noise Output Feedback Control

    Full text link
    We propose a policy iteration algorithm for solving the multiplicative noise linear quadratic output feedback design problem. The algorithm solves a set of coupled Riccati equations for estimation and control arising from a partially observable Markov decision process (POMDP) under a class of linear dynamic control policies. We show in numerical experiments far faster convergence than a value iteration algorithm, formerly the only known algorithm for solving this class of problem. The results suggest promising future research directions for policy optimization algorithms in more general POMDPs, including the potential to develop novel approximate data-driven approaches when model parameters are not available

    Hessian-CoCoA: a general parallel and distributed framework for non-strongly convex regularizers

    No full text
    The scale of modern datasets necessitates the development of efficient distributed and parallel optimization methods for machine learning. We present a general-purpose framework for the distributed and parallel environments, Hessian-CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal pro- cessing. Hessian-CoCoA has been designed starting from an existing communication-efficient distributed framework, CoCoA, that has been shown to be competitive with the state-of-the- art distributed methods for non-strongly convex regularizers and non-smooth loss functions. In both the frameworks, instead of optimizing the original problem, the computation is dis- tributed by defining data-local subproblems that are solved independently in parallel. In particular, the main idea behind Hessian-CoCoA is to include more refined second order information in the subproblems, providing a tighter and better local approximation than the one used by CoCoA. The new framework has markedly improved performance in terms of both convergence rate and total convergence time, as we will illustrate with an extensive set of experiments

    On the Synthesis of Bellman Inequalities for Data-Driven Optimal Control

    No full text
    In the context of the linear programming (LP) approach to data-driven control, one assumes that the dynamical system is unknown but can be observed indirectly through data on its evolution. Both theoretical and empirical evidence suggest that a desired suboptimality gap is often only achieved with massive exploration of the state-space. In case of linear systems, we discuss how a relatively small but sufficiently rich dataset can be exploited to generate new constraints offline and without observing the corresponding transitions. Moreover, we show how to reconstruct the associated unknown stage-costs and, when the system is stochastic, we offer insights on the related problem of estimating the expected value in the Bellman operator without re-initializing the dynamics in the same state-input pairs

    Data-driven optimal control with a relaxed linear program

    No full text
    The linear programming (LP) approach has a long history in the theory of approximate dynamic programming. When it comes to computation, however, the LP approach often suffers from poor scalability. In this work, we introduce a relaxed version of the Bellman operator for q-functions and prove that it is still a monotone contraction mapping with a unique fixed point. In the spirit of the LP approach, we exploit the new operator to build a relaxed linear program (RLP). Compared to the standard LP formulation, our RLP has only one family of constraints and half the decision variables, making it more scalable and computationally efficient. For deterministic systems, the RLP trivially returns the correct q-function. For stochastic linear systems in continuous spaces, the solution to the RLP preserves the minimizer of the optimal q-function, hence retrieves the optimal policy. Theoretical results are backed up in simulation where we solve sampled versions of the LPs with data collected by interacting with the environment. For general nonlinear systems, we observe that the RLP again tends to preserve the minimizers of the solution to the LP, though the relative performance is influenced by the specific geometry of the problem.ISSN:0005-109
    corecore