451 research outputs found
Deep Residual Learning for Small-Footprint Keyword Spotting
We explore the application of deep residual learning and dilated convolutions
to the keyword spotting task, using the recently-released Google Speech
Commands Dataset as our benchmark. Our best residual network (ResNet)
implementation significantly outperforms Google's previous convolutional neural
networks in terms of accuracy. By varying model depth and width, we can achieve
compact models that also outperform previous small-footprint variants. To our
knowledge, we are the first to examine these approaches for keyword spotting,
and our results establish an open-source state-of-the-art reference to support
the development of future speech-based interfaces.Comment: Published in ICASSP 201
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers
Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of
small electronic devices that allow interaction with them via speech. Often,
KWS systems are speaker-independent, which means that any person --user or
not-- might trigger them. For applications like KWS for hearing assistive
devices this is unacceptable, as only the user must be allowed to handle them.
In this paper we propose KWS for hearing assistive devices that is robust to
external speakers. A state-of-the-art deep residual network for small-footprint
KWS is regarded as a basis to build upon. By following a multi-task learning
scheme, this system is extended to jointly perform KWS and users'
own-voice/external speaker detection with a negligible increase in the number
of parameters. For experiments, we generate from the Google Speech Commands
Dataset a speech corpus emulating hearing aids as a capturing device. Our
results show that this multi-task deep residual network is able to achieve a
KWS accuracy relative improvement of around 32% with respect to a system that
does not deal with external speakers
- …