59 research outputs found
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
In this note, we present a new averaging technique for the projected
stochastic subgradient method. By using a weighted average with a weight of t+1
for each iterate w_t at iteration t, we obtain the convergence rate of O(1/t)
with both an easy proof and an easy implementation. The new scheme is compared
empirically to existing techniques, with similar performance behavior.Comment: 8 pages, 6 figures. Changes with previous version: Added reference to
concurrently submitted work arXiv:1212.1824v1; clarifications added; typos
corrected; title changed to 'subgradient method' as 'subgradient descent' is
misnome
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
In this work, we consider the distributed optimization of non-smooth convex
functions using a network of computing units. We investigate this problem under
two regularity assumptions: (1) the Lipschitz continuity of the global
objective function, and (2) the Lipschitz continuity of local individual
functions. Under the local regularity assumption, we provide the first optimal
first-order decentralized algorithm called multi-step primal-dual (MSPD) and
its corresponding optimal convergence rate. A notable aspect of this result is
that, for non-smooth functions, while the dominant term of the error is in
, the structure of the communication network only impacts a
second-order term in , where is time. In other words, the error due
to limits in communication resources decreases at a fast rate even in the case
of non-strongly-convex objective functions. Under the global regularity
assumption, we provide a simple yet efficient algorithm called distributed
randomized smoothing (DRS) based on a local smoothing of the objective
function, and show that DRS is within a multiplicative factor of the
optimal convergence rate, where is the underlying dimension.Comment: 17 page
Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks
We study diffusion and consensus based optimization of a sum of unknown
convex objective functions over distributed networks. The only access to these
functions is through stochastic gradient oracles, each of which is only
available at a different node, and a limited number of gradient oracle calls is
allowed at each node. In this framework, we introduce a convex optimization
algorithm based on the stochastic gradient descent (SGD) updates. Particularly,
we use a carefully designed time-dependent weighted averaging of the SGD
iterates, which yields a convergence rate of
after gradient updates for each node on
a network of nodes. We then show that after gradient oracle calls, the
average SGD iterate achieves a mean square deviation (MSD) of
. This rate of convergence is optimal as it
matches the performance lower bound up to constant terms. Similar to the SGD
algorithm, the computational complexity of the proposed algorithm also scales
linearly with the dimensionality of the data. Furthermore, the communication
load of the proposed method is the same as the communication load of the SGD
algorithm. Thus, the proposed algorithm is highly efficient in terms of
complexity and communication load. We illustrate the merits of the algorithm
with respect to the state-of-art methods over benchmark real life data sets and
widely studied network topologies
- …