60,361 research outputs found
Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning
Federated learning is a distributed framework for training machine learning
models over the data residing at mobile devices, while protecting the privacy
of individual users. A major bottleneck in scaling federated learning to a
large number of users is the overhead of secure model aggregation across many
users. In particular, the overhead of the state-of-the-art protocols for secure
model aggregation grows quadratically with the number of users. In this paper,
we propose the first secure aggregation framework, named Turbo-Aggregate, that
in a network with users achieves a secure aggregation overhead of
, as opposed to , while tolerating up to a user dropout
rate of . Turbo-Aggregate employs a multi-group circular strategy for
efficient model aggregation, and leverages additive secret sharing and novel
coding techniques for injecting aggregation redundancy in order to handle user
dropouts while guaranteeing user privacy. We experimentally demonstrate that
Turbo-Aggregate achieves a total running time that grows almost linear in the
number of users, and provides up to speedup over the
state-of-the-art protocols with up to users. Our experiments also
demonstrate the impact of model size and bandwidth on the performance of
Turbo-Aggregate
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Deep learning frameworks have been widely deployed on GPU servers for deep
learning applications in both academia and industry. In training deep neural
networks (DNNs), there are many standard processes or algorithms, such as
convolution and stochastic gradient descent (SGD), but the running performance
of different frameworks might be different even running the same deep model on
the same GPU hardware. In this study, we evaluate the running performance of
four state-of-the-art distributed deep learning frameworks (i.e., Caffe-MPI,
CNTK, MXNet, and TensorFlow) over single-GPU, multi-GPU, and multi-node
environments. We first build performance models of standard processes in
training DNNs with SGD, and then we benchmark the running performance of these
frameworks with three popular convolutional neural networks (i.e., AlexNet,
GoogleNet and ResNet-50), after that, we analyze what factors that result in
the performance gap among these four frameworks. Through both analytical and
experimental analysis, we identify bottlenecks and overheads which could be
further optimized. The main contribution is that the proposed performance
models and the analysis provide further optimization directions in both
algorithmic design and system configuration.Comment: Published at DataCom'201
An Algorithm for Network and Data-aware Placement of Multi-Tier Applications in Cloud Data Centers
Today's Cloud applications are dominated by composite applications comprising
multiple computing and data components with strong communication correlations
among them. Although Cloud providers are deploying large number of computing
and storage devices to address the ever increasing demand for computing and
storage resources, network resource demands are emerging as one of the key
areas of performance bottleneck. This paper addresses network-aware placement
of virtual components (computing and data) of multi-tier applications in data
centers and formally defines the placement as an optimization problem. The
simultaneous placement of Virtual Machines and data blocks aims at reducing the
network overhead of the data center network infrastructure. A greedy heuristic
is proposed for the on-demand application components placement that localizes
network traffic in the data center interconnect. Such optimization helps
reducing communication overhead in upper layer network switches that will
eventually reduce the overall traffic volume across the data center. This, in
turn, will help reducing packet transmission delay, increasing network
performance, and minimizing the energy consumption of network components.
Experimental results demonstrate performance superiority of the proposed
algorithm over other approaches where it outperforms the state-of-the-art
network-aware application placement algorithm across all performance metrics by
reducing the average network cost up to 67% and network usage at core switches
up to 84%, as well as increasing the average number of application deployments
up to 18%.Comment: Submitted for publication consideration for the Journal of Network
and Computer Applications (JNCA). Total page: 28. Number of figures: 15
figure
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
- …