Search CORE

60,361 research outputs found

Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning

Author: Avestimehr A. Salman
Guler Basak
So Jinhyun
Publication venue
Publication date: 24/05/2020
Field of study

Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users. In particular, the overhead of the state-of-the-art protocols for secure model aggregation grows quadratically with the number of users. In this paper, we propose the first secure aggregation framework, named Turbo-Aggregate, that in a network with

N

users achieves a secure aggregation overhead of

O(N\log{N})

, as opposed to

O(N^2)

, while tolerating up to a user dropout rate of

50\%

. Turbo-Aggregate employs a multi-group circular strategy for efficient model aggregation, and leverages additive secret sharing and novel coding techniques for injecting aggregation redundancy in order to handle user dropouts while guaranteeing user privacy. We experimentally demonstrate that Turbo-Aggregate achieves a total running time that grows almost linear in the number of users, and provides up to

40\times

speedup over the state-of-the-art protocols with up to

N=200

users. Our experiments also demonstrate the impact of model size and bandwidth on the performance of Turbo-Aggregate

arXiv.org e-Print Archive

Cryptology ePrint Archive

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Author: Chu Xiaowen
Shi Shaohuai
Wang Qiang
Publication venue
Publication date: 20/08/2018
Field of study

Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In training deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution and stochastic gradient descent (SGD), but the running performance of different frameworks might be different even running the same deep model on the same GPU hardware. In this study, we evaluate the running performance of four state-of-the-art distributed deep learning frameworks (i.e., Caffe-MPI, CNTK, MXNet, and TensorFlow) over single-GPU, multi-GPU, and multi-node environments. We first build performance models of standard processes in training DNNs with SGD, and then we benchmark the running performance of these frameworks with three popular convolutional neural networks (i.e., AlexNet, GoogleNet and ResNet-50), after that, we analyze what factors that result in the performance gap among these four frameworks. Through both analytical and experimental analysis, we identify bottlenecks and overheads which could be further optimized. The main contribution is that the proposed performance models and the analysis provide further optimization directions in both algorithmic design and system configuration.Comment: Published at DataCom'201

arXiv.org e-Print Archive

Crossref

An Algorithm for Network and Data-aware Placement of Multi-Tier Applications in Cloud Data Centers

Author: Buyya Rajkumar
Calheiros Rodrigo N.
Ferdaus Md Hasanul
Murshed Manzur
Publication venue
Publication date: 01/01/2017
Field of study

Today's Cloud applications are dominated by composite applications comprising multiple computing and data components with strong communication correlations among them. Although Cloud providers are deploying large number of computing and storage devices to address the ever increasing demand for computing and storage resources, network resource demands are emerging as one of the key areas of performance bottleneck. This paper addresses network-aware placement of virtual components (computing and data) of multi-tier applications in data centers and formally defines the placement as an optimization problem. The simultaneous placement of Virtual Machines and data blocks aims at reducing the network overhead of the data center network infrastructure. A greedy heuristic is proposed for the on-demand application components placement that localizes network traffic in the data center interconnect. Such optimization helps reducing communication overhead in upper layer network switches that will eventually reduce the overall traffic volume across the data center. This, in turn, will help reducing packet transmission delay, increasing network performance, and minimizing the energy consumption of network components. Experimental results demonstrate performance superiority of the proposed algorithm over other approaches where it outperforms the state-of-the-art network-aware application placement algorithm across all performance metrics by reducing the average network cost up to 67% and network usage at core switches up to 84%, as well as increasing the average number of application deployments up to 18%.Comment: Submitted for publication consideration for the Journal of Network and Computer Applications (JNCA). Total page: 28. Number of figures: 15 figure

arXiv.org e-Print Archive

Crossref

Federation ResearchOnline

Western Sydney ResearchDirect

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

Author: Forte Simone
Jaggi Martin
Jordan Michael I.
Ma Chenxin
Smith Virginia
Takac Martin
Publication venue
Publication date: 21/06/2017
Field of study

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

Author: Alsheikh Mohammad Abu
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

University of Canberra Research Repository