2,328 research outputs found
Training Keyword Spotting Models on Non-IID Data with Federated Learning
We demonstrate that a production-quality keyword-spotting model can be
trained on-device using federated learning and achieve comparable false accept
and false reject rates to a centrally-trained model. To overcome the
algorithmic constraints associated with fitting on-device data (which are
inherently non-independent and identically distributed), we conduct thorough
empirical studies of optimization algorithms and hyperparameter configurations
using large-scale federated simulations. To overcome resource constraints, we
replace memory intensive MTR data augmentation with SpecAugment, which reduces
the false reject rate by 56%. Finally, to label examples (given the zero
visibility into on-device data), we explore teacher-student training.Comment: Submitted to Interspeech 202
Federated Learning for Keyword Spotting
We propose a practical approach based on federated learning to solve
out-of-domain issues with continuously running embedded speech-based models
such as wake word detectors. We conduct an extensive empirical study of the
federated averaging algorithm for the "Hey Snips" wake word based on a
crowdsourced dataset that mimics a federation of wake word users. We
empirically demonstrate that using an adaptive averaging strategy inspired from
Adam in place of standard weighted model averaging highly reduces the number of
communication rounds required to reach our target performance. The associated
upstream communication costs per user are estimated at 8 MB, which is a
reasonable in the context of smart home voice assistants. Additionally, the
dataset used for these experiments is being open sourced with the aim of
fostering further transparent research in the application of federated learning
to speech data.Comment: Accepted for publication to ICASSP 201
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
With the breakthroughs in deep learning, the recent years have witnessed a
booming of artificial intelligence (AI) applications and services, spanning
from personal assistant to recommendation systems to video/audio surveillance.
More recently, with the proliferation of mobile computing and
Internet-of-Things (IoT), billions of mobile and IoT devices are connected to
the Internet, generating zillions Bytes of data at the network edge. Driving by
this trend, there is an urgent need to push the AI frontiers to the network
edge so as to fully unleash the potential of the edge big data. To meet this
demand, edge computing, an emerging paradigm that pushes computing tasks and
services from the network core to the network edge, has been widely recognized
as a promising solution. The resulted new inter-discipline, edge AI or edge
intelligence, is beginning to receive a tremendous amount of interest. However,
research on edge intelligence is still in its infancy stage, and a dedicated
venue for exchanging the recent advances of edge intelligence is highly desired
by both the computer system and artificial intelligence communities. To this
end, we conduct a comprehensive survey of the recent research efforts on edge
intelligence. Specifically, we first review the background and motivation for
artificial intelligence running at the network edge. We then provide an
overview of the overarching architectures, frameworks and emerging key
technologies for deep learning model towards training/inference at the network
edge. Finally, we discuss future research opportunities on edge intelligence.
We believe that this survey will elicit escalating attentions, stimulate
fruitful discussions and inspire further research ideas on edge intelligence.Comment: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang,
"Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge
Computing," Proceedings of the IEE
Faster Asynchronous SGD
Asynchronous distributed stochastic gradient descent methods have trouble
converging because of stale gradients. A gradient update sent to a parameter
server by a client is stale if the parameters used to calculate that gradient
have since been updated on the server. Approaches have been proposed to
circumvent this problem that quantify staleness in terms of the number of
elapsed updates. In this work, we propose a novel method that quantifies
staleness in terms of moving averages of gradient statistics. We show that this
method outperforms previous methods with respect to convergence speed and
scalability to many clients. We also discuss how an extension to this method
can be used to dramatically reduce bandwidth costs in a distributed training
context. In particular, our method allows reduction of total bandwidth usage by
a factor of 5 with little impact on cost convergence. We also describe (and
link to) a software library that we have used to simulate these algorithms
deterministically on a single machine.Comment: 10 page
Distributed Deep Learning for Question Answering
This paper is an empirical study of the distributed deep learning for
question answering subtasks: answer selection and question classification.
Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP,
DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results
show that the distributed framework based on the message passing interface can
accelerate the convergence speed at a sublinear scale. This paper demonstrates
the importance of distributed training. For example, with 48 workers, a 24x
speedup is achievable for the answer selection task and running time is
decreased from 138.2 hours to 5.81 hours, which will increase the productivity
significantly.Comment: This paper will appear in the Proceeding of The 25th ACM
International Conference on Information and Knowledge Management (CIKM 2016),
Indianapolis, US
From Federated Learning to Federated Neural Architecture Search: A Survey
Federated learning is a recently proposed distributed machine learning
paradigm for privacy preservation, which has found a wide range of applications
where data privacy is of primary concern. Meanwhile, neural architecture search
has become very popular in deep learning for automatically tuning the
architecture and hyperparameters of deep neural networks. While both federated
learning and neural architecture search are faced with many open challenges,
searching for optimized neural architectures in the federated learning
framework is particularly demanding. This survey paper starts with a brief
introduction to federated learning, including both horizontal, vertical, and
hybrid federated learning. Then, neural architecture search approaches based on
reinforcement learning, evolutionary algorithms and gradient-based are
presented. This is followed by a description of federated neural architecture
search that has recently been proposed, which is categorized into online and
offline implementations, and single- and multi-objective search approaches.
Finally, remaining open research questions are outlined and promising research
topics are suggested
Dynamic Gradient Aggregation for Federated Domain Adaptation
In this paper, a new learning algorithm for Federated Learning (FL) is
introduced. The proposed scheme is based on a weighted gradient aggregation
using two-step optimization to offer a flexible training pipeline. Herein, two
different flavors of the aggregation method are presented, leading to an order
of magnitude improvement in convergence speed compared to other distributed or
FL training algorithms like BMUF and FedAvg. Further, the aggregation algorithm
acts as a regularizer of the gradient quality. We investigate the effect of our
FL algorithm in supervised and unsupervised Speech Recognition (SR) scenarios.
The experimental validation is performed based on three tasks: first, the
LibriSpeech task showing a speed-up of 7x and 6% word error rate reduction
(WERR) compared to the baseline results. The second task is based on session
adaptation providing 20% WERR over a powerful LAS model. Finally, our
unsupervised pipeline is applied to the conversational SR task. The proposed FL
system outperforms the baseline systems in both convergence speed and overall
model performance.Comment: arXiv admin note: substantial text overlap with arXiv:2008.0245
No Peek: A Survey of private distributed deep learning
We survey distributed deep learning models for training or inference without
accessing raw data from clients. These methods aim to protect confidential
patterns in data while still allowing servers to train models. The distributed
deep learning methods of federated learning, split learning and large batch
stochastic gradient descent are compared in addition to private and secure
approaches of differential privacy, homomorphic encryption, oblivious transfer
and garbled circuits in the context of neural networks. We study their
benefits, limitations and trade-offs with regards to computational resources,
data leakage and communication efficiency and also share our anticipated future
trends.Comment: 21 page
Communication-Efficient Federated Learning via Optimal Client Sampling
Federated learning is a private and efficient framework for learning models
in settings where data is distributed across many clients. Due to interactive
nature of the training process, frequent communication of large amounts of
information is required between the clients and the central server which
aggregates local models. We propose a novel, simple and efficient way of
updating the central model in communication-constrained settings by determining
the optimal client sampling policy. In particular, modeling the progression of
clients' weights by an Ornstein-Uhlenbeck process allows us to derive the
optimal sampling strategy for selecting a subset of clients with significant
weight updates. The central server then collects local models from only the
selected clients and subsequently aggregates them. We propose four client
sampling strategies and test them on two federated learning benchmark tests,
namely, a classification task on EMNIST and a realistic language modeling task
using the Stackoverflow dataset. The results show that the proposed framework
provides significant reduction in communication while maintaining competitive
or achieving superior performance compared to baseline. Our methods introduce a
new line of communication strategies orthogonal to the existing user-local
methods such as quantization or sparsification, thus complementing rather than
aiming to replace them
Differentially-Private "Draw and Discard" Machine Learning
In this work, we propose a novel framework for privacy-preserving
client-distributed machine learning. It is motivated by the desire to achieve
differential privacy guarantees in the local model of privacy in a way that
satisfies all systems constraints using asynchronous client-server
communication and provides attractive model learning properties. We call it
"Draw and Discard" because it relies on random sampling of models for load
distribution (scalability), which also provides additional server-side privacy
protections and improved model quality through averaging. We present the
mechanics of client and server components of "Draw and Discard" and demonstrate
how the framework can be applied to learning Generalized Linear models. We then
analyze the privacy guarantees provided by our approach against several types
of adversaries and showcase experimental results that provide evidence for the
framework's viability in practical deployments
- …