2,583 research outputs found

    Robust audiovisual speech recognition using noise-adaptive linear discriminant analysis

    Get PDF
    © 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-machine interaction, but it is still not sufficiently reliable when used under highly noisy or reverberant conditions. One option for achieving far greater robustness is to include another modality that is unaffected by acoustic noise, such as video information. Currently the most successful approaches for such audiovisual ASR systems, coupled hidden Markov models (HMMs) and turbo decoding, both allow for slight asynchrony between audio and video features, and significantly improve recognition rates in this way. However, both typically still neglect residual errors in the estimation of audio features, so-called observation uncertainties. This paper compares two strategies for adding these observation uncertainties into the decoder, and shows that significant recognition rate improvements are achievable for both coupled HMMs and turbo decoding

    Real-Time Decision Fusion for Multimodal Neural Prosthetic Devices

    Get PDF
    The field of neural prosthetics aims to develop prosthetic limbs with a brain-computer interface (BCI) through which neural activity is decoded into movements. A natural extension of current research is the incorporation of neural activity from multiple modalities to more accurately estimate the user's intent. The challenge remains how to appropriately combine this information in real-time for a neural prosthetic device., i.e., fusing predictions from several single-modality decoders to produce a more accurate device state estimate. We examine two algorithms for continuous variable decision fusion: the Kalman filter and artificial neural networks (ANNs). Using simulated cortical neural spike signals, we implemented several successful individual neural decoding algorithms, and tested the capabilities of each fusion method in the context of decoding 2-dimensional endpoint trajectories of a neural prosthetic arm. Extensively testing these methods on random trajectories, we find that on average both the Kalman filter and ANNs successfully fuse the individual decoder estimates to produce more accurate predictions.Our results reveal that a fusion-based approach has the potential to improve prediction accuracy over individual decoders of varying quality, and we hope that this work will encourage multimodal neural prosthetics experiments in the future

    An iterative multimodal framework for the transcription of handwritten historical documents

    Full text link
    [EN] The transcription of historical documents is one of the most interesting tasks in which Handwritten Text Recognition can be applied, due to its interest in humanities research. One alternative for transcribing the ancient manuscripts is the use of speech dictation by using Automatic Speech Recognition techniques. In the two alternatives similar models (Hidden Markov Models and n-grams) and decoding processes (Viterbi decoding) are employed, which allows a possible combination of the two modalities with little diffi- culties. In this work, we explore the possibility of using recognition results of one modality to restrict the decoding process of the other modality, and apply this process iteratively. Results of these multimodal iterative alternatives are significantly better than the baseline uni-modal systems and better than the non-iterative alternatives. 2012 Elsevier B.V. All rights reserved.Work supported by the EC (FEDER/FSE) and the Spanish MEC/MICINN under the MIPRCV ’’Consolider Ingenio 2010’’ program (CSD2007-00018), iTrans2 (TIN2009–14511) and MITTRAL (TIN2009-14633-C03–01) projects. Also supported by the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project and by the Generalitat Valenciana under grant GV/2010/067, and by the UPV under project PAID-05-11-2779 and grant UPV/2009/2851.Alabau, V.; Martínez Hinarejos, CD.; Romero Gómez, V.; Lagarda Arroyo, AL. (2014). An iterative multimodal framework for the transcription of handwritten historical documents. Pattern Recognition Letters. 35:195-203. https://doi.org/10.1016/j.patrec.2012.11.007S1952033

    Autoencoding the Retrieval Relevance of Medical Images

    Full text link
    Content-based image retrieval (CBIR) of medical images is a crucial task that can contribute to a more reliable diagnosis if applied to big data. Recent advances in feature extraction and classification have enormously improved CBIR results for digital images. However, considering the increasing accessibility of big data in medical imaging, we are still in need of reducing both memory requirements and computational expenses of image retrieval systems. This work proposes to exclude the features of image blocks that exhibit a low encoding error when learned by a n/p/nn/p/n autoencoder (p ⁣< ⁣np\!<\!n). We examine the histogram of autoendcoding errors of image blocks for each image class to facilitate the decision which image regions, or roughly what percentage of an image perhaps, shall be declared relevant for the retrieval task. This leads to reduction of feature dimensionality and speeds up the retrieval process. To validate the proposed scheme, we employ local binary patterns (LBP) and support vector machines (SVM) which are both well-established approaches in CBIR research community. As well, we use IRMA dataset with 14,410 x-ray images as test data. The results show that the dimensionality of annotated feature vectors can be reduced by up to 50% resulting in speedups greater than 27% at expense of less than 1% decrease in the accuracy of retrieval when validating the precision and recall of the top 20 hits.Comment: To appear in proceedings of The 5th International Conference on Image Processing Theory, Tools and Applications (IPTA'15), Nov 10-13, 2015, Orleans, Franc
    corecore