1,890 research outputs found
Stochastic gradient descent on Riemannian manifolds
Stochastic gradient descent is a simple approach to find the local minima of
a cost function whose evaluations are corrupted by noise. In this paper, we
develop a procedure extending stochastic gradient descent algorithms to the
case where the function is defined on a Riemannian manifold. We prove that, as
in the Euclidian case, the gradient descent algorithm converges to a critical
point of the cost function. The algorithm has numerous potential applications,
and is illustrated here by four examples. In particular a novel gossip
algorithm on the set of covariance matrices is derived and tested numerically.Comment: A slightly shorter version has been published in IEEE Transactions
Automatic Contro
Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up
We analyse the learning performance of Distributed Gradient Descent in the
context of multi-agent decentralised non-parametric regression with the square
loss function when i.i.d. samples are assigned to agents. We show that if
agents hold sufficiently many samples with respect to the network size, then
Distributed Gradient Descent achieves optimal statistical rates with a number
of iterations that scales, up to a threshold, with the inverse of the spectral
gap of the gossip matrix divided by the number of samples owned by each agent
raised to a problem-dependent power. The presence of the threshold comes from
statistics. It encodes the existence of a "big data" regime where the number of
required iterations does not depend on the network topology. In this regime,
Distributed Gradient Descent achieves optimal statistical rates with the same
order of iterations as gradient descent run with all the samples in the
network. Provided the communication delay is sufficiently small, the
distributed protocol yields a linear speed-up in runtime compared to the
single-machine protocol. This is in contrast to decentralised optimisation
algorithms that do not exploit statistics and only yield a linear speed-up in
graphs where the spectral gap is bounded away from zero. Our results exploit
the statistical concentration of quantities held by agents and shed new light
on the interplay between statistics and communication in decentralised methods.
Bounds are given in the standard non-parametric setting with source/capacity
assumptions
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Distributed deep learning is an effective way to reduce the training time of
deep learning for large datasets as well as complex models. However, the
limited scalability caused by network overheads makes it difficult to
synchronize the parameters of all workers. To resolve this problem,
gossip-based methods that demonstrates stable scalability regardless of the
number of workers have been proposed. However, to use gossip-based methods in
general cases, the validation accuracy for a large mini-batch needs to be
verified. To verify this, we first empirically study the characteristics of
gossip methods in a large mini-batch problem and observe that the gossip
methods preserve higher validation accuracy than AllReduce-SGD(Stochastic
Gradient Descent) when the number of batch sizes is increased and the number of
workers is fixed. However, the delayed parameter propagation of the
gossip-based models decreases validation accuracy in large node scales. To cope
with this problem, we propose Crossover-SGD that alleviates the delay
propagation of weight parameters via segment-wise communication and load
balancing random network topology. We also adapt hierarchical communication to
limit the number of workers in gossip-based communication methods. To validate
the effectiveness of our proposed method, we conduct empirical experiments and
observe that our Crossover-SGD shows higher node scalability than
SGP(Stochastic Gradient Push).Comment: Under review as a journal paper at CCP
Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions
In decentralized networks (of sensors, connected objects, etc.), there is an
important need for efficient algorithms to optimize a global cost function, for
instance to learn a global model from the local data collected by each
computing unit. In this paper, we address the problem of decentralized
minimization of pairwise functions of the data points, where these points are
distributed over the nodes of a graph defining the communication topology of
the network. This general problem finds applications in ranking, distance
metric learning and graph inference, among others. We propose new gossip
algorithms based on dual averaging which aims at solving such problems both in
synchronous and asynchronous settings. The proposed framework is flexible
enough to deal with constrained and regularized variants of the optimization
problem. Our theoretical analysis reveals that the proposed algorithms preserve
the convergence rate of centralized dual averaging up to an additive bias term.
We present numerical simulations on Area Under the ROC Curve (AUC) maximization
and metric learning problems which illustrate the practical interest of our
approach
- …