63,567 research outputs found
Distributed Stochastic Optimization of the Regularized Risk
Many machine learning algorithms minimize a regularized risk, and stochastic
optimization is widely used for this task. When working with massive data, it
is desirable to perform stochastic optimization in parallel. Unfortunately,
many existing stochastic optimization algorithms cannot be parallelized
efficiently. In this paper we show that one can rewrite the regularized risk
minimization problem as an equivalent saddle-point problem, and propose an
efficient distributed stochastic optimization (DSO) algorithm. We prove the
algorithm's rate of convergence; remarkably, our analysis shows that the
algorithm scales almost linearly with the number of processors. We also verify
with empirical evaluations that the proposed algorithm is competitive with
other parallel, general purpose stochastic and batch optimization algorithms
for regularized risk minimization
Randomized Smoothing for Stochastic Optimization
We analyze convergence rates of stochastic optimization procedures for
non-smooth convex optimization problems. By combining randomized smoothing
techniques with accelerated gradient methods, we obtain convergence rates of
stochastic optimization procedures, both in expectation and with high
probability, that have optimal dependence on the variance of the gradient
estimates. To the best of our knowledge, these are the first variance-based
rates for non-smooth optimization. We give several applications of our results
to statistical estimation problems, and provide experimental results that
demonstrate the effectiveness of the proposed algorithms. We also describe how
a combination of our algorithm with recent work on decentralized optimization
yields a distributed stochastic optimization algorithm that is order-optimal.Comment: 39 pages, 3 figure
Non-stationary Stochastic Optimization
We consider a non-stationary variant of a sequential stochastic optimization
problem, in which the underlying cost functions may change along the horizon.
We propose a measure, termed variation budget, that controls the extent of said
change, and study how restrictions on this budget impact achievable
performance. We identify sharp conditions under which it is possible to achieve
long-run-average optimality and more refined performance measures such as rate
optimality that fully characterize the complexity of such problems. In doing
so, we also establish a strong connection between two rather disparate strands
of literature: adversarial online convex optimization; and the more traditional
stochastic approximation paradigm (couched in a non-stationary setting). This
connection is the key to deriving well performing policies in the latter, by
leveraging structure of optimal policies in the former. Finally, tight bounds
on the minimax regret allow us to quantify the "price of non-stationarity,"
which mathematically captures the added complexity embedded in a temporally
changing environment versus a stationary one
Distributed Delayed Stochastic Optimization
We analyze the convergence of gradient-based optimization algorithms that
base their updates on delayed stochastic gradient information. The main
application of our results is to the development of gradient-based distributed
optimization algorithms where a master node performs parameter updates while
worker nodes compute stochastic gradients based on local information in
parallel, which may give rise to delays due to asynchrony. We take motivation
from statistical problems where the size of the data is so large that it cannot
fit on one computer; with the advent of huge datasets in biology, astronomy,
and the internet, such problems are now common. Our main contribution is to
show that for smooth stochastic problems, the delays are asymptotically
negligible and we can achieve order-optimal convergence results. In application
to distributed optimization, we develop procedures that overcome communication
bottlenecks and synchronization requirements. We show -node architectures
whose optimization error in stochastic problems---in spite of asynchronous
delays---scales asymptotically as \order(1 / \sqrt{nT}) after iterations.
This rate is known to be optimal for a distributed system with nodes even
in the absence of delays. We additionally complement our theoretical results
with numerical experiments on a statistical machine learning task.Comment: 27 pages, 4 figure
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Deep CCA is a recently proposed deep neural network extension to the
traditional canonical correlation analysis (CCA), and has been successful for
multi-view representation learning in several domains. However, stochastic
optimization of the deep CCA objective is not straightforward, because it does
not decouple over training examples. Previous optimizers for deep CCA are
either batch-based algorithms or stochastic optimization using large
minibatches, which can have high memory consumption. In this paper, we tackle
the problem of stochastic optimization for deep CCA with small minibatches,
based on an iterative solution to the CCA objective, and show that we can
achieve as good performance as previous optimizers and thus alleviate the
memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and
Computin
- …