28 research outputs found

    On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging

    Full text link
    This paper considers stochastic subgradient mirror-descent method for solving constrained convex minimization problems. In particular, a stochastic subgradient mirror-descent method with weighted iterate-averaging is investigated and its per-iterate convergence rate is analyzed. The novel part of the approach is in the choice of weights that are used to construct the averages. Through the use of these weighted averages, we show that the known optimal rates can be obtained with simpler algorithms than those currently existing in the literature. Specifically, by suitably choosing the stepsize values, one can obtain the rate of the order 1/k1/k for strongly convex functions, and the rate 1/k1/\sqrt{k} for general convex functions (not necessarily differentiable). Furthermore, for the latter case, it is shown that a stochastic subgradient mirror-descent with iterate averaging converges (along a subsequence) to an optimal solution, almost surely, even with the stepsize of the form 1/1+k1/\sqrt{1+k}, which was not previously known. The stepsize choices that achieve the best rates are those proposed by Paul Tseng for acceleration of proximal gradient methods

    A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

    Get PDF
    Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic gradient algorithm (SGD), where instead of updating the weight vector along the negative direction of the stochastic gradient, the update is performed in a "mirror domain" defined by the gradient of a (strictly convex) potential function. This potential function, and the mirror domain it yields, provides considerable flexibility in the algorithm compared to SGD. While many properties of SMD have already been obtained in the literature, in this paper we exhibit a new interpretation of SMD, namely that it is a risk-sensitive optimal estimator when the unknown weight vector and additive noise are non-Gaussian and belong to the exponential family of distributions. The analysis also suggests a modified version of SMD, which we refer to as symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman divergence, which allow us to extend results from quadratics and Gaussians to certain convex functions and exponential families in a rather seamless way

    Distributed relatively smooth optimization

    Get PDF
    Smoothness conditions, either on the cost itself or its gradients, are ubiquitous in the development and study of gradient-based algorithms for optimization and learning. In the context of distributed optimization and multi-agent systems, smoothness conditions and gradient bounds are additionally central to controlling the effect of local heterogeneity. We deviate from this paradigm and study distributed learning problems in relatively smooth environments, where cost functions may grow faster than a quadratic, and gradients need not be bounded. We generalize gradient noise conditions to cover this setting, and present convergence guarantees in relatively smooth and relatively convex environments. Numerical results corroborate the findings

    Inexact Online Proximal Mirror Descent for time-varying composite optimization

    Full text link
    In this paper, we consider the online proximal mirror descent for solving the time-varying composite optimization problems. For various applications, the algorithm naturally involves the errors in the gradient and proximal operator. We obtain sharp estimates on the dynamic regret of the algorithm when the regular part of the cost is convex and smooth. If the Bregman distance is given by the Euclidean distance, our result also improves the previous work in two ways: (i) We establish a sharper regret bound compared to the previous work in the sense that our estimate does not involve O(T)O(T) term appearing in that work. (ii) We also obtain the result when the domain is the whole space Rn\mathbb{R}^n, whereas the previous work was obtained only for bounded domains. We also provide numerical tests for problems involving the errors in the gradient and proximal operator.Comment: 16 pages, 5 figure