Search CORE

3 research outputs found

Realizing Petabyte Scale Acoustic Modeling

Author: Ladkat Pranav
Parthasarathi Sree Hari Krishnan
Sivakrishnan Nitin
Strom Nikko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/04/2019
Field of study

Large scale machine learning (ML) systems such as the Alexa automatic speech recognition (ASR) system continue to improve with increasing amounts of manually transcribed training data. Instead of scaling manual transcription to impractical levels, we utilize semi-supervised learning (SSL) to learn acoustic models (AM) from the vast firehose of untranscribed audio data. Learning an AM from 1 Million hours of audio presents unique ML and system design challenges. We present the design and evaluation of a highly scalable and resource efficient SSL system for AM. Employing the student/teacher learning paradigm, we focus on the student learning subsystem: a scalable and robust data pipeline that generates features and targets from raw audio, and an efficient model pipeline, including the distributed trainer, that builds a student model. Our evaluations show that, even without extensive hyper-parameter tuning, we obtain relative accuracy improvements in the 10 to 20

\%

range, with higher gains in noisier conditions. The end-to-end processing time of this SSL system was 12 days, and several components in this system can trivially scale linearly with more compute resources.Comment: 2156-3357 \copyright 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more informatio

arXiv.org e-Print Archive

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

Author: Kunzmann Siegfried
Liu Jing
Lyu Chunchuan
Mouchtaris Athanasios
Parthasarathi Sree Hari Krishnan
Swaminathan Rupak Vignesh
Publication venue
Publication date: 10/06/2021
Field of study

We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind. We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% word error rate reduction (WERR). When increasing the supervised data to seven-fold, our gains diminish to 7.1% WERR; to improve SSL efficiency at larger supervised data regimes, we employ a step-wise distillation into a smaller model, obtaining a WERR of 14.4%. We then switch to SSL using larger student models in low data regimes; while learning efficiency with unsupervised data is higher, student models may outperform teacher models in such a setting. We develop a theoretical sketch to explain this behavior.Comment: TSD202

arXiv.org e-Print Archive

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

Author: Cui Xiaodong
Finkler Ulrich
Kung David
Picheny Michael
Saon George
Zhang Wei
Publication venue
Publication date: 24/02/2020
Field of study

The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep neural network acoustic models for ASR. Starting with the fundamentals of data parallel stochastic gradient descent (SGD) and ASR acoustic modeling, we will investigate various distributed training strategies and their realizations in high performance computing (HPC) environments with an emphasis on striking the balance between communication and computation. Experiments are carried out on a popular public benchmark to study the convergence, speedup and recognition performance of the investigated strategies.Comment: Accepted to IEEE Signal Processing Magazin

arXiv.org e-Print Archive