Search CORE

4 research outputs found

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Author: Jensen Jesper
Lopez-Espejo Ivan
Tan Zheng-Hua
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2019
Field of study

Crossref

VBN

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Author: Jensen Jesper
López-Espejo Iván
Tan Zheng-Hua
Publication venue
Publication date: 26/06/2019
Field of study

Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers

arXiv.org e-Print Archive

Crossref

VBN

Low-Resource Keyword Spotting for Hearing Assistive Devices

Author: Jensen Jesper
Lopez-Espejo Ivan
Tan Zheng-Hua
Publication venue
Publication date: 01/01/2019
Field of study

VBN

Exploring Filterbank Learning for Keyword Spotting

Author: chollet
kingma
mittermaier
ravanelli
robertson
sainath
tan
warden
Publication venue
Publication date: 30/05/2020
Field of study

Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice when using modern KWS back-ends, we also hypothesize that this could be a symptom of information redundancy, which opens up new research possibilities in the field of small-footprint KWS

arXiv.org e-Print Archive

Crossref

VBN