685 research outputs found
Deep Residual Learning for Small-Footprint Keyword Spotting
We explore the application of deep residual learning and dilated convolutions
to the keyword spotting task, using the recently-released Google Speech
Commands Dataset as our benchmark. Our best residual network (ResNet)
implementation significantly outperforms Google's previous convolutional neural
networks in terms of accuracy. By varying model depth and width, we can achieve
compact models that also outperform previous small-footprint variants. To our
knowledge, we are the first to examine these approaches for keyword spotting,
and our results establish an open-source state-of-the-art reference to support
the development of future speech-based interfaces.Comment: Published in ICASSP 201
Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting
We propose a max-pooling based loss function for training Long Short-Term
Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low
CPU, memory, and latency requirements. The max-pooling loss training can be
further guided by initializing with a cross-entropy loss trained network. A
posterior smoothing based evaluation approach is employed to measure keyword
spotting performance. Our experimental results show that LSTM models trained
using cross-entropy loss or max-pooling loss outperform a cross-entropy loss
trained baseline feed-forward Deep Neural Network (DNN). In addition,
max-pooling loss trained LSTM with randomly initialized network performs better
compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss
trained LSTM initialized with a cross-entropy pre-trained network shows the
best performance, which yields relative reduction compared to baseline
feed-forward DNN in Area Under the Curve (AUC) measure
- …