28 research outputs found
On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging
This paper considers stochastic subgradient mirror-descent method for solving
constrained convex minimization problems. In particular, a stochastic
subgradient mirror-descent method with weighted iterate-averaging is
investigated and its per-iterate convergence rate is analyzed. The novel part
of the approach is in the choice of weights that are used to construct the
averages. Through the use of these weighted averages, we show that the known
optimal rates can be obtained with simpler algorithms than those currently
existing in the literature. Specifically, by suitably choosing the stepsize
values, one can obtain the rate of the order for strongly convex
functions, and the rate for general convex functions (not
necessarily differentiable). Furthermore, for the latter case, it is shown that
a stochastic subgradient mirror-descent with iterate averaging converges (along
a subsequence) to an optimal solution, almost surely, even with the stepsize of
the form , which was not previously known. The stepsize choices
that achieve the best rates are those proposed by Paul Tseng for acceleration
of proximal gradient methods
A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality
Stochastic mirror descent (SMD) is a fairly new family of algorithms that has
recently found a wide range of applications in optimization, machine learning,
and control. It can be considered a generalization of the classical stochastic
gradient algorithm (SGD), where instead of updating the weight vector along the
negative direction of the stochastic gradient, the update is performed in a
"mirror domain" defined by the gradient of a (strictly convex) potential
function. This potential function, and the mirror domain it yields, provides
considerable flexibility in the algorithm compared to SGD. While many
properties of SMD have already been obtained in the literature, in this paper
we exhibit a new interpretation of SMD, namely that it is a risk-sensitive
optimal estimator when the unknown weight vector and additive noise are
non-Gaussian and belong to the exponential family of distributions. The
analysis also suggests a modified version of SMD, which we refer to as
symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman
divergence, which allow us to extend results from quadratics and Gaussians to
certain convex functions and exponential families in a rather seamless way
Distributed relatively smooth optimization
Smoothness conditions, either on the cost itself or its gradients, are ubiquitous in the development and study of gradient-based algorithms for optimization and learning. In the context of distributed optimization and multi-agent systems, smoothness conditions and gradient bounds are additionally central to controlling the effect of local heterogeneity. We deviate from this paradigm and study distributed learning problems in relatively smooth environments, where cost functions may grow faster than a quadratic, and gradients need not be bounded. We generalize gradient noise conditions to cover this setting, and present convergence guarantees in relatively smooth and relatively convex environments. Numerical results corroborate the findings
Inexact Online Proximal Mirror Descent for time-varying composite optimization
In this paper, we consider the online proximal mirror descent for solving the
time-varying composite optimization problems. For various applications, the
algorithm naturally involves the errors in the gradient and proximal operator.
We obtain sharp estimates on the dynamic regret of the algorithm when the
regular part of the cost is convex and smooth. If the Bregman distance is given
by the Euclidean distance, our result also improves the previous work in two
ways: (i) We establish a sharper regret bound compared to the previous work in
the sense that our estimate does not involve term appearing in that
work. (ii) We also obtain the result when the domain is the whole space
, whereas the previous work was obtained only for bounded
domains. We also provide numerical tests for problems involving the errors in
the gradient and proximal operator.Comment: 16 pages, 5 figure