Search CORE

413 research outputs found

An area-efficient ultra-low-power time-domain feature extractor for edge keyword spotting

Author: Chang Y
Chen Qinyu
Gao Chang
Kim Kwantae
Liu Shih-Chii
Publication venue
Publication date: 25/05/2023
Field of study

Learning sound representations using trainable COPE feature extractors

Author: Petkov Nicolai
Strisciuglio Nicola
Vento Mario
Publication venue: 'Elsevier BV'
Publication date: 22/03/2019
Field of study

Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR). In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct a set of COPE feature extractors, configured on a number of training patterns. Then we take their responses to build feature vectors that we use in combination with a classifier to detect and classify patterns of interest in audio signals. We carried out experiments on four public data sets: MIVIA audio events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund) demonstrate the effectiveness of the proposed method and are higher than the ones obtained by other existing approaches. The COPE feature extractors have high robustness to variations of SNR. Real-time performance is achieved even when the value of a large number of features is computed.Comment: Accepted for publication in Pattern Recognitio

arXiv.org e-Print Archive

Proceedings - University of Groningen

Dissertations of the University of Groningen

How Tiny Can Analog Filterbank Features Be Made for Ultra-low-power On-device Keyword Spotting?

Author: Gordiyenko Maria
Kinget Peter
Ray Subhajit
Sun Xinghua
Tremelling Nolan
Publication venue
Publication date: 17/04/2023
Field of study

Analog feature extraction is a power-efficient and re-emerging signal processing paradigm for implementing the front-end feature extractor in on device keyword-spotting systems. Despite its power efficiency and re-emergence, there is little consensus on what values the architectural parameters of its critical block, the analog filterbank, should be set to, even though they strongly influence power consumption. Towards building consensus and approaching fundamental power consumption limits, we find via simulation that through careful selection of its architectural parameters, the power of a typical state-of-the-art analog filterbank could be reduced by 33.6x, while sacrificing only 1.8% in downstream 10-word keyword spotting accuracy through a back-end neural network.Comment: Accepted as a full paper by the TinyML Research Symposium 202

arXiv.org e-Print Archive

Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks

Author: ANDOLINA Salvatore
GENNARO Francesca
GENTILE Antonio
SINISCALCHI Sabato Marco
SORBELLO Filippo
VITABILE Salvatore
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, the full system implementation is described. The system has a first stage for MFCC extraction followed by a second stage implementing a sinusoidal based multi-layer perceptron for speech event classification. Implementation details over a Celoxica RC203 board are give

Archivio istituzionale della ricerca - Università di Palermo

Ultimate Trends in Integrated Systems to Enhance Automatic Speech Recognition Performance

Author: C. Dur&#225
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

A 23μW Solar-Powered Keyword-Spotting ASIC with Ring-Oscillator-Based Time-Domain Feature Extraction

Author: Delbruck Tobi
Gao Chang
Graca Rui
Kim Kwantae
Kiselev Ilya
Liu Shih-Chii
Yoo Hoi-Jun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/02/2022
Field of study

Voice-controlled interfaces on acoustic Internet-of-Things (IoT) sensor nodes and mobile devices require integrated low-power always-on wake-up functions such as Voice Activity Detection (VAD) and Keyword Spotting (KWS) to ensure longer battery life. Most VAD and KWS ICs focused on reducing the power of the feature extractor (FEx) as it is the most power-hungry building block. A serial Fast Fourier Transform (FFT)-based KWS chip [1] achieved 510nW; however, it suffered from a high 64ms latency and was limited to detection of only 1-to-4 keywords (2-to-5 classes). Although the analog FEx [2]–[3] for VAD/KWS reported 0.2μW-to-1 μW and 10ms-to-100ms latency, neither demonstrated >5 classes in keyword detection. In addition, their voltage-domain implementations cannot benefit from process scaling because the low supply voltage reduces signal swing; and the degradation of intrinsic gain forces transistors to have larger lengths and poor linearity