468 research outputs found
Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting
We propose a max-pooling based loss function for training Long Short-Term
Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low
CPU, memory, and latency requirements. The max-pooling loss training can be
further guided by initializing with a cross-entropy loss trained network. A
posterior smoothing based evaluation approach is employed to measure keyword
spotting performance. Our experimental results show that LSTM models trained
using cross-entropy loss or max-pooling loss outperform a cross-entropy loss
trained baseline feed-forward Deep Neural Network (DNN). In addition,
max-pooling loss trained LSTM with randomly initialized network performs better
compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss
trained LSTM initialized with a cross-entropy pre-trained network shows the
best performance, which yields relative reduction compared to baseline
feed-forward DNN in Area Under the Curve (AUC) measure
Fuzzy reasoning in confidence evaluation of speech recognition
Confidence measures represent a systematic way to express reliability of speech recognition results. A common approach to confidence measuring is to take profit of the information that several recognition-related features offer and to combine them, through a given compilation mechanism , into a more effective way to distinguish between correct and incorrect recognition results. We propose to use a fuzzy reasoning scheme to perform the information compilation step. Our approach opposes the previously proposed ones because ours treats the uncertainty of recognition hypotheses in terms ofPeer ReviewedPostprint (published version
HMM word graph based keyword spotting in handwritten document images
[EN] Line-level keyword spotting (KWS) is presented on the basis of frame-level word posterior
probabilities. These posteriors are obtained using word graphs derived from the recogni-
tion process of a full-fledged handwritten text recognizer based on hidden Markov models
and N-gram language models. This approach has several advantages. First, since it uses
a holistic, segmentation-free technology, it does not require any kind of word or charac-
ter segmentation. Second, the use of language models allows the context of each spotted
word to be taken into account, thereby considerably increasing KWS accuracy. And third,
the proposed KWS scores are based on true posterior probabilities, taking into account
all (or most) possible word segmentations of the input image. These scores are properly
bounded and normalized. This mathematically clean formulation lends itself to smooth,
threshold-based keyword queries which, in turn, permit comfortable trade-offs between
search precision and recall. Experiments are carried out on several historic collections of
handwritten text images, as well as a well-known data set of modern English handwrit-
ten text. According to the empirical results, the proposed approach achieves KWS results
comparable to those obtained with the recently-introduced "BLSTM neural networks KWS"
approach and clearly outperform the popular, state-of-the-art "Filler HMM" KWS method.
Overall, the results clearly support all the above-claimed advantages of the proposed ap-
proach.This work has been partially supported by the Generalitat Valenciana under the Prometeo/2009/014 project grant ALMA-MATER, and through the EU projects: HIMANIS (JPICH programme, Spanish grant Ref. PCIN-2015-068) and READ (Horizon 2020 programme, grant Ref. 674943).Toselli, AH.; Vidal, E.; Romero, V.; Frinken, V. (2016). HMM word graph based keyword spotting in handwritten document images. Information Sciences. 370:497-518. https://doi.org/10.1016/j.ins.2016.07.063S49751837
- …