1,420 research outputs found
Stochastic Optimization with Importance Sampling
Uniform sampling of training data has been commonly used in traditional
stochastic optimization algorithms such as Proximal Stochastic Gradient Descent
(prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although
uniform sampling can guarantee that the sampled stochastic quantity is an
unbiased estimate of the corresponding true quantity, the resulting estimator
may have a rather high variance, which negatively affects the convergence of
the underlying optimization procedure. In this paper we study stochastic
optimization with importance sampling, which improves the convergence rate by
reducing the stochastic variance. Specifically, we study prox-SGD (actually,
stochastic mirror descent) with importance sampling and prox-SDCA with
importance sampling. For prox-SGD, instead of adopting uniform sampling
throughout the training process, the proposed algorithm employs importance
sampling to minimize the variance of the stochastic gradient. For prox-SDCA,
the proposed importance sampling scheme aims to achieve higher expected dual
value at each dual coordinate ascent step. We provide extensive theoretical
analysis to show that the convergence rates with the proposed importance
sampling methods can be significantly improved under suitable conditions both
for prox-SGD and for prox-SDCA. Experiments are provided to verify the
theoretical analysis.Comment: 29 page
Distributed Partitioned Big-Data Optimization via Asynchronous Dual Decomposition
In this paper we consider a novel partitioned framework for distributed
optimization in peer-to-peer networks. In several important applications the
agents of a network have to solve an optimization problem with two key
features: (i) the dimension of the decision variable depends on the network
size, and (ii) cost function and constraints have a sparsity structure related
to the communication graph. For this class of problems a straightforward
application of existing consensus methods would show two inefficiencies: poor
scalability and redundancy of shared information. We propose an asynchronous
distributed algorithm, based on dual decomposition and coordinate methods, to
solve partitioned optimization problems. We show that, by exploiting the
problem structure, the solution can be partitioned among the nodes, so that
each node just stores a local copy of a portion of the decision variable
(rather than a copy of the entire decision vector) and solves a small-scale
local problem
- …