2 research outputs found
Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition
The past decade has witnessed great progress in Automatic Speech Recognition
(ASR) due to advances in deep learning. The improvements in performance can be
attributed to both improved models and large-scale training data. Key to
training such models is the employment of efficient distributed learning
techniques. In this article, we provide an overview of distributed training
techniques for deep neural network acoustic models for ASR. Starting with the
fundamentals of data parallel stochastic gradient descent (SGD) and ASR
acoustic modeling, we will investigate various distributed training strategies
and their realizations in high performance computing (HPC) environments with an
emphasis on striking the balance between communication and computation.
Experiments are carried out on a popular public benchmark to study the
convergence, speedup and recognition performance of the investigated
strategies.Comment: Accepted to IEEE Signal Processing Magazin
A(DP)SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy
As deep learning models are usually massive and complex, distributed learning
is essential for increasing training efficiency. Moreover, in many real-world
application scenarios like healthcare, distributed learning can also keep the
data local and protect privacy. A popular distributed learning strategy is
federated learning, where there is a central server storing the global model
and a set of local computing nodes updating the model parameters with their
corresponding data. The updated model parameters will be processed and
transmitted to the central server, which leads to heavy communication costs.
Recently, asynchronous decentralized distributed learning has been proposed and
demonstrated to be a more efficient and practical strategy where there is no
central server, so that each computing node only communicates with its
neighbors. Although no raw data will be transmitted across different local
nodes, there is still a risk of information leak during the communication
process for malicious participants to make attacks. In this paper, we present a
differentially private version of asynchronous decentralized parallel SGD
(ADPSGD) framework, or A(DP)SGD for short, which maintains communication
efficiency of ADPSGD and prevents the inference from malicious participants.
Specifically, R{\'e}nyi differential privacy is used to provide tighter privacy
analysis for our composite Gaussian mechanisms while the convergence rate is
consistent with the non-private version. Theoretical analysis shows
A(DP)SGD also converges at the optimal rate as
SGD. Empirically, A(DP)SGD achieves comparable model accuracy as the
differentially private version of Synchronous SGD (SSGD) but runs much faster
than SSGD in heterogeneous computing environments