Dual-Attention Neural Transducers for Efficient Wake Word Spotting in
  Speech Recognition

Alexandridis, Anastasios; Chang, Feng-Ju; Kunzmann, Siegfried; Liu, Jing; McGowan, Ross; Mouchtaris, Athanasios; Muniyappa, Thejaswi; Rastrow, Ariya; Sahai, Saumya Y.; Sathyendra, Kanthashree M.; Strimel, Grant P.

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

Authors: Anastasios Alexandridis
Feng-Ju Chang
Siegfried Kunzmann
Jing Liu
Ross McGowan
Athanasios Mouchtaris
Thejaswi Muniyappa
Ariya Rastrow
Saumya Y. Sahai
Kanthashree M. Sathyendra
Grant P. Strimel
Publication date: 4 April 2023
Publisher

Abstract

We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by

90\%

for WW audio frames, with only

1\%

increase in the number of parameters. This architecture improves WW F1 score by

16\%

relative and improves generic rare word error rate by

3\%

relative compared to the baselines.Comment: Accepted to Proc. IEEE ICASSP 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2304.01905

Last time updated on 10/04/2023