2,583 research outputs found
Robust audiovisual speech recognition using noise-adaptive linear discriminant analysis
© 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-machine interaction, but it is still not sufficiently reliable when used under highly noisy or reverberant conditions. One option for achieving far greater robustness is to include another modality that is unaffected by acoustic noise, such as video information. Currently the most successful approaches for such audiovisual ASR systems, coupled hidden Markov models (HMMs) and turbo decoding, both allow for slight asynchrony between audio and video features, and significantly improve recognition rates in this way. However, both typically still neglect residual errors in the estimation of audio features, so-called observation uncertainties. This paper compares two strategies for adding these observation uncertainties into the decoder, and shows that significant recognition rate improvements are achievable for both coupled HMMs and turbo decoding
Real-Time Decision Fusion for Multimodal Neural Prosthetic Devices
The field of neural prosthetics aims to develop prosthetic limbs with a brain-computer interface (BCI) through which neural activity is decoded into movements. A natural extension of current research is the incorporation of neural activity from multiple modalities to more accurately estimate the user's intent. The challenge remains how to appropriately combine this information in real-time for a neural prosthetic device., i.e., fusing predictions from several single-modality decoders to produce a more accurate device state estimate. We examine two algorithms for continuous variable decision fusion: the Kalman filter and artificial neural networks (ANNs). Using simulated cortical neural spike signals, we implemented several successful individual neural decoding algorithms, and tested the capabilities of each fusion method in the context of decoding 2-dimensional endpoint trajectories of a neural prosthetic arm. Extensively testing these methods on random trajectories, we find that on average both the Kalman filter and ANNs successfully fuse the individual decoder estimates to produce more accurate predictions.Our results reveal that a fusion-based approach has the potential to improve prediction accuracy over individual decoders of varying quality, and we hope that this work will encourage multimodal neural prosthetics experiments in the future
An iterative multimodal framework for the transcription of handwritten historical documents
[EN] The transcription of historical documents is one of the most interesting tasks in which Handwritten Text
Recognition can be applied, due to its interest in humanities research. One alternative for transcribing the
ancient manuscripts is the use of speech dictation by using Automatic Speech Recognition techniques. In
the two alternatives similar models (Hidden Markov Models and n-grams) and decoding processes (Viterbi
decoding) are employed, which allows a possible combination of the two modalities with little diffi-
culties. In this work, we explore the possibility of using recognition results of one modality to restrict
the decoding process of the other modality, and apply this process iteratively. Results of these multimodal
iterative alternatives are significantly better than the baseline uni-modal systems and better than
the non-iterative alternatives.
2012 Elsevier B.V. All rights reserved.Work supported by the EC (FEDER/FSE) and the Spanish MEC/MICINN under the MIPRCV ’’Consolider Ingenio 2010’’ program (CSD2007-00018), iTrans2 (TIN2009–14511) and MITTRAL (TIN2009-14633-C03–01) projects. Also supported by the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project and by the Generalitat Valenciana under grant GV/2010/067, and by the UPV under project PAID-05-11-2779 and grant UPV/2009/2851.Alabau, V.; Martínez Hinarejos, CD.; Romero Gómez, V.; Lagarda Arroyo, AL. (2014). An iterative multimodal framework for the transcription of handwritten historical documents. Pattern Recognition Letters. 35:195-203. https://doi.org/10.1016/j.patrec.2012.11.007S1952033
Autoencoding the Retrieval Relevance of Medical Images
Content-based image retrieval (CBIR) of medical images is a crucial task that
can contribute to a more reliable diagnosis if applied to big data. Recent
advances in feature extraction and classification have enormously improved CBIR
results for digital images. However, considering the increasing accessibility
of big data in medical imaging, we are still in need of reducing both memory
requirements and computational expenses of image retrieval systems. This work
proposes to exclude the features of image blocks that exhibit a low encoding
error when learned by a autoencoder (). We examine the
histogram of autoendcoding errors of image blocks for each image class to
facilitate the decision which image regions, or roughly what percentage of an
image perhaps, shall be declared relevant for the retrieval task. This leads to
reduction of feature dimensionality and speeds up the retrieval process. To
validate the proposed scheme, we employ local binary patterns (LBP) and support
vector machines (SVM) which are both well-established approaches in CBIR
research community. As well, we use IRMA dataset with 14,410 x-ray images as
test data. The results show that the dimensionality of annotated feature
vectors can be reduced by up to 50% resulting in speedups greater than 27% at
expense of less than 1% decrease in the accuracy of retrieval when validating
the precision and recall of the top 20 hits.Comment: To appear in proceedings of The 5th International Conference on Image
Processing Theory, Tools and Applications (IPTA'15), Nov 10-13, 2015,
Orleans, Franc
- …