919 research outputs found
HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words
Streaming keyword spotting is a widely used solution for activating voice
assistants. Deep Neural Networks with Hidden Markov Model (DNN-HMM) based
methods have proven to be efficient and widely adopted in this space, primarily
because of the ability to detect and identify the start and end of the wake-up
word at low compute cost. However, such hybrid systems suffer from loss metric
mismatch when the DNN and HMM are trained independently. Sequence
discriminative training cannot fully mitigate the loss-metric mismatch due to
the inherent Markovian style of the operation. We propose an low footprint CNN
model, called HEiMDaL, to detect and localize keywords in streaming conditions.
We introduce an alignment-based classification loss to detect the occurrence of
the keyword along with an offset loss to predict the start of the keyword.
HEiMDaL shows 73% reduction in detection metrics along with equivalent
localization accuracy and with the same memory footprint as existing DNN-HMM
style models for a given wake-word
TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants
Wake-up word spotting in noisy environments is a critical task for an excellent user experience with voice assistants. Unwanted activation of the device is often due to the presence of noises coming from background conversations, TVs, or other domestic appliances. In this work, we propose the use of a speech enhancement convolutional autoencoder, coupled with on-device keyword spotting, aimed at improving the trigger word detection in noisy environments. The end-to-end system learns by optimizing a linear combination of losses: a reconstruction-based loss, both at the log-mel spectrogram and at the waveform level, as well as a specific task loss that accounts for the cross-entropy error reported along the keyword spotting detection. We experiment with several neural network classifiers and report that deeply coupling the speech enhancement together with a wake-up word detector, e.g., by jointly training them, significantly improves the performance in the noisiest conditions. Additionally, we introduce a new publicly available speech database recorded for the Telefónica's voice assistant, Aura. The OK Aura Wake-up Word Dataset incorporates rich metadata, such as speaker demographics or room conditions, and comprises hard negative examples that were studiously selected to present different levels of phonetic similarity with respect to the trigger words 'OK Aura'. Keywords: speech enhancement; wake-up word; keyword spotting; deep learning; convolutional neural networ
Deep Spoken Keyword Spotting:An Overview
Spoken keyword spotting (KWS) deals with the identification of keywords in
audio streams and has become a fast-growing technology thanks to the paradigm
shift introduced by deep learning a few years ago. This has allowed the rapid
embedding of deep KWS in a myriad of small electronic devices with different
purposes like the activation of voice assistants. Prospects suggest a sustained
growth in terms of social use of this technology. Thus, it is not surprising
that deep KWS has become a hot research topic among speech scientists, who
constantly look for KWS performance improvement and computational complexity
reduction. This context motivates this paper, in which we conduct a literature
review into deep spoken KWS to assist practitioners and researchers who are
interested in this technology. Specifically, this overview has a comprehensive
nature by covering a thorough analysis of deep KWS systems (which includes
speech features, acoustic modeling and posterior handling), robustness methods,
applications, datasets, evaluation metrics, performance of deep KWS systems and
audio-visual KWS. The analysis performed in this paper allows us to identify a
number of directions for future research, including directions adopted from
automatic speech recognition research and directions that are unique to the
problem of spoken KWS
- …