Search CORE

56,619 research outputs found

Asynchronous Distributed Semi-Stochastic Gradient Optimization

Author: Kwok James T.
Zhang Ruiliang
Zheng Shuai
Publication venue
Publication date: 04/12/2015
Field of study

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However, existing algorithms either suffer from slow convergence due to the inherent variance of stochastic gradients, or have a fast linear convergence rate but at the expense of poorer solution quality. In this paper, we combine their merits by proposing a fast distributed asynchronous SGD-based algorithm with variance reduction. A constant learning rate can be used, and it is also guaranteed to converge linearly to the optimal solution. Experiments on the Google Cloud Computing Platform demonstrate that the proposed algorithm outperforms state-of-the-art distributed asynchronous algorithms in terms of both wall clock time and solution quality

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers

Author: Shroff Ness
Wierman Adam
Zhou Xingyu
Publication venue: 'Elsevier BV'
Publication date: 20/02/2020
Field of study

We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of the queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework also sheds light on a recent open question on how to design optimal load balancing schemes using delayed information

arXiv.org e-Print Archive

Caltech Authors

The discrete-time queue with geometrically distributed service capacities revisited

Author: Bruneel Herwig
Claeys Dieter
Walraevens Joris
Wittevrongel Sabine
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography

Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning

Author: Avestimehr A. Salman
Guler Basak
So Jinhyun
Publication venue
Publication date: 24/05/2020
Field of study

Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users. In particular, the overhead of the state-of-the-art protocols for secure model aggregation grows quadratically with the number of users. In this paper, we propose the first secure aggregation framework, named Turbo-Aggregate, that in a network with

N

users achieves a secure aggregation overhead of

O(N\log{N})

, as opposed to

O(N^2)

, while tolerating up to a user dropout rate of

50\%

. Turbo-Aggregate employs a multi-group circular strategy for efficient model aggregation, and leverages additive secret sharing and novel coding techniques for injecting aggregation redundancy in order to handle user dropouts while guaranteeing user privacy. We experimentally demonstrate that Turbo-Aggregate achieves a total running time that grows almost linear in the number of users, and provides up to

40\times

speedup over the state-of-the-art protocols with up to

N=200

users. Our experiments also demonstrate the impact of model size and bandwidth on the performance of Turbo-Aggregate

arXiv.org e-Print Archive

Cryptology ePrint Archive