756 research outputs found

    Distributed Stochastic Approximation for Solving Network Optimization Problems Under Random Quantization

    Full text link
    We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. This imperfect communication poses a fundamental challenge, and this imperfect communication, if not properly accounted for, prevents the convergence of these algorithms. Our first contribution in this paper is to propose a modified consensus-based gradient method for solving such problems using random (dithered) quantization. This algorithm can be interpreted as a distributed variant of a well-known two-time-scale stochastic algorithm. We then study the convergence and derive upper bounds on the rates of convergence of the proposed method as a function of the bandwidths available between the nodes and the underlying network topology, for both convex and strongly convex objective functions. Our results complement for existing literature where such convergence and explicit formulas of the convergence rates are missing. Finally, we provide numerical simulations to compare the convergence properties of the distributed gradient methods with and without quantization for solving the well-known regression problems over networks, for both quadratic and absolute loss functions

    Fast Convergence Rates of Distributed Subgradient Methods with Adaptive Quantization

    Full text link
    We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. Recent advances using the distributed gradient algorithm with a quantization scheme at a fixed resolution have established convergence, but at rates significantly slower than when the communications are unquantized. In this paper, we introduce a novel quantization method, which we refer to as adaptive quantization, that allows us to match the convergence rates under perfect communications. Our approach adjusts the quantization scheme used by each node as the algorithm progresses: as we approach the solution, we become more certain about where the state variables are localized, and adapt the quantizer codebook accordingly. We bound the convergence rates of the proposed method as a function of the communication bandwidth, the underlying network topology, and structural properties of the constituent objective functions. In particular, we show that if the objective functions are convex or strongly convex, then using adaptive quantization does not affect the rate of convergence of the distributed subgradient methods when the communications are quantized, except for a constant that depends on the resolution of the quantizer. To the best of our knowledge, the rates achieved in this paper are better than any existing work in the literature for distributed gradient methods under finite communication bandwidths. We also provide numerical simulations that compare convergence properties of the distributed gradient methods with and without quantization for solving distributed regression problems for both quadratic and absolute loss functions.Comment: arXiv admin note: text overlap with arXiv:1810.1156

    Finite-Time Performance of Distributed Two-Time-Scale Stochastic Approximation

    Full text link
    Two-time-scale stochastic approximation is a popular iterative method for finding the solution of a system of two equations. Such methods have found broad applications in many areas, especially in machine learning and reinforcement learning. In this paper, we propose a distributed variant of this method over a network of agents, where the agents use two graphs representing their communication at different speeds due to the nature of their two-time-scale updates. Our main contribution is to provide a finite-time analysis for the performance of the proposed method. In particular, we establish an upper bound for the convergence rates of the mean square errors at the agents to zero as a function of the step sizes and the network topology

    A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

    Full text link
    We study a novel two-time-scale stochastic gradient method for solving optimization problems where the gradient samples are generated from a time-varying Markov random process parameterized by the underlying optimization variable. These time-varying samples make the stochastic gradient biased and dependent, which can potentially lead to the divergence of the iterates. To address this issue, we consider a two-time-scale update scheme, where one scale is used to estimate the true gradient from the Markovian samples and the other scale is used to update the decision variable with the estimated gradient. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different objective functions, namely, strong convexity, convexity, non-convexity under the PL condition, and general non-convexity. Our second contribution is to apply our framework to study the performance of the popular actor-critic methods in solving stochastic control and reinforcement learning problems. First, we study an online natural actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k−2/3)\mathcal{O}(k^{-2/3}) is achieved. This is the first time such a result is known in the literature. Second, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k−2/5)\mathcal{O}(k^{-2/5}), which recovers the best known rate derived specifically for this problem. Finally, we support our theoretical analysis with numerical simulations where the convergence rate is visualized

    Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

    Full text link
    We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the value (global discounted accumulative reward) associated with each environmental state. Over a series of time steps, the agents act, get rewarded, update their local estimate of the value function, then communicate with their neighbors. The local update at each agent can be interpreted as a distributed variant of the popular temporal difference learning methods {\sf TD}(λ) (\lambda). Our main contribution is to provide a finite-analysis on the performance of this distributed {\sf TD}(λ)(\lambda) algorithm for both constant and time-varying step sizes. The key idea in our analysis is to use the geometric mixing time τ\tau of the underlying Markov chain, that is, although the "noise" in our algorithm is Markovian, its dependence is very weak at samples spaced out at every τ\tau. We provide an explicit upper bound on the convergence rate of the proposed method as a function of the network topology, the discount factor, the constant λ\lambda, and the mixing time τ\tau. Our results also provide a mathematical explanation for observations that have appeared previously in the literature about the choice of λ\lambda. Our upper bound illustrates the trade-off between approximation accuracy and convergence speed implicit in the choice of λ\lambda. When λ=1\lambda=1, the solution will correspond to the best possible approximation of the value function, while choosing λ=0\lambda = 0 leads to faster convergence when the noise in the algorithm has large variance.Comment: arXiv admin note: text overlap with arXiv:1902.0739

    Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

    Full text link
    We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of local rewards observed by the agents. Over a series of time steps, the agents act, get rewarded, update their local estimate of the value function, then communicate with their neighbors. The local update at each agent can be interpreted as a distributed consensus-based variant of the popular temporal difference learning algorithm TD(0). While distributed reinforcement learning algorithms have been presented in the literature, almost nothing is known about their convergence rate. Our main contribution is providing a finite-time analysis for the convergence of the distributed TD(0) algorithm. We do this when the communication network between the agents is time-varying in general. We obtain an explicit upper bound on the rate of convergence of this algorithm as a function of the network topology and the discount factor. Our results mirror what we would expect from using distributed stochastic gradient descent for solving convex optimization problems

    A Reinforcement Learning Framework for Sequencing Multi-Robot Behaviors

    Full text link
    Given a list of behaviors and associated parameterized controllers for solving different individual tasks, we study the problem of selecting an optimal sequence of coordinated behaviors in multi-robot systems for completing a given mission, which could not be handled by any single behavior. In addition, we are interested in the case where partial information of the underlying mission is unknown, therefore, the robots must cooperatively learn this information through their course of actions. Such problem can be formulated as an optimal decision problem in multi-robot systems, however, it is in general intractable due to modeling imperfections and the curse of dimensionality of the decision variables. To circumvent these issues, we first consider an alternate formulation of the original problem through introducing a sequence of behaviors' switching times. Our main contribution is then to propose a novel reinforcement learning based method, that combines Q-learning and online gradient descent, for solving this reformulated problem. In particular, the optimal sequence of the robots' behaviors is found by using Q-learning while the optimal parameters of the associated controllers are obtained through an online gradient descent method. Finally, to illustrate the effectiveness of our proposed method we implement it on a team of differential-drive robots for solving two different missions, namely, convoy protection and object manipulation.Comment: 6 page

    A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

    Full text link
    We develop a mathematical framework for solving multi-task reinforcement learning (MTRL) problems based on a type of policy gradient method. The goal in MTRL is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state spaces, but have different rewards and dynamics. We highlight two fundamental challenges in MTRL that are not present in its single task counterpart, and illustrate them with simple examples. We then develop a decentralized entropy-regularized policy gradient method for solving the MTRL problem, and study its finite-time convergence rate. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale "GridWorld" problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to navigate an airborne drone in multiple (simulated) environments

    Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

    Full text link
    Actor-critic style two-time-scale algorithms are very popular in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the global convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory. Our analysis applies to very general settings, as we only assume that the underlying Markov chain is ergodic under all policies (the so-called Recurrence assumption). We employ ϵ\epsilon-greedy sampling in order to ensure enough exploration. For a fixed exploration parameter ϵ\epsilon, we show that the natural actor critic algorithm is O(1ϵT1/4+ϵ)\mathcal{O}(\frac{1}{\epsilon T^{1/4}}+\epsilon) close to the global optimum after TT iterations of the algorithm. By carefully diminishing the exploration parameter ϵ\epsilon as the iterations proceed, we also show convergence to the global optimum at a rate of O(1/T1/6)\mathcal{O}(1/T^{1/6}).Comment: 34 pages, 6 figure

    Convergence Rates of Accelerated Markov Gradient Descent with Applications in Reinforcement Learning

    Full text link
    Motivated by broad applications in machine learning, we study the popular accelerated stochastic gradient descent (ASGD) algorithm for solving (possibly nonconvex) optimization problems. We characterize the finite-time performance of this method when the gradients are sampled from Markov processes, and hence biased and dependent from time step to time step; in contrast, the analysis in existing work relies heavily on the stochastic gradients being independent and sometimes unbiased. Our main contributions show that under certain (standard) assumptions on the underlying Markov chain generating the gradients, ASGD converges at the nearly the same rate with Markovian gradient samples as with independent gradient samples. The only difference is a logarithmic factor that accounts for the mixing time of the Markov chain. One of the key motivations for this study are complicated control problems that can be modeled by a Markov decision process and solved using reinforcement learning. We apply the accelerated method to several challenging problems in the OpenAI Gym and Mujoco, and show that acceleration can significantly improve the performance of the classic REINFORCE algorithm
    • …
    corecore