Search CORE

3,087 research outputs found

Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

Author: Fu Gengshen
Mandal Arindam
Matsoukas Spyros
Panchapagesan Sankaran
Raju Anirudh
Strom Nikko
Sun Ming
Tucker George
Vitaladevuni Shiv
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/05/2017
Field of study

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields

67.6\%

relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure

arXiv.org e-Print Archive

Crossref

Phonetic Searching

Author
Publication venue
Publication date: 12/11/2006
Field of study

An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.Georgia Tech Research Corporatio

Scholarly Materials And Research @ Georgia Tech

Topic Identification for Speech without ASR

Author: Harman Craig
Khudanpur Sanjeev
Liu Chunxi
Trmal Jan
Wiesner Matthew
Publication venue
Publication date: 11/07/2017
Field of study

Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs. However, under resource-limited conditions, the manually transcribed speech required to develop standard ASR systems can be severely limited or unavailable. In this paper, we investigate alternative unsupervised solutions to obtaining tokenizations of speech in terms of a vocabulary of automatically discovered word-like or phoneme-like units, without depending on the supervised training of ASR systems. Moreover, using automatic phoneme-like tokenizations, we demonstrate that a convolutional neural network based framework for learning spoken document representations provides competitive performance compared to a standard bag-of-words representation, as evidenced by comprehensive topic ID evaluations on both single-label and multi-label classification tasks.Comment: 5 pages, 2 figures; accepted for publication at Interspeech 201

arXiv.org e-Print Archive

Crossref

Fuzzy reasoning in confidence evaluation of speech recognition

Author: Hernández-Abrego G
Mariño Acebal José Bernardo
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 01/01/1999
Field of study

Confidence measures represent a systematic way to express reliability of speech recognition results. A common approach to confidence measuring is to take profit of the information that several recognition-related features offer and to combine them, through a given compilation mechanism , into a more effective way to distinguish between correct and incorrect recognition results. We propose to use a fuzzy reasoning scheme to perform the information compilation step. Our approach opposes the previously proposed ones because ours treats the uncertainty of recognition hypotheses in terms ofPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC