2 research outputs found
Few-Shot Keyword Spotting With Prototypical Networks
Recognizing a particular command or a keyword, keyword spotting has been
widely used in many voice interfaces such as Amazon's Alexa and Google Home. In
order to recognize a set of keywords, most of the recent deep learning based
approaches use a neural network trained with a large number of samples to
identify certain pre-defined keywords. This restricts the system from
recognizing new, user-defined keywords. Therefore, we first formulate this
problem as a few-shot keyword spotting and approach it using metric learning.
To enable this research, we also synthesize and publish a Few-shot Google
Speech Commands dataset. We then propose a solution to the few-shot keyword
spotting problem using temporal and dilated convolutions on prototypical
networks. Our comparative experimental results demonstrate keyword spotting of
new keywords using just a small number of samples