13,697 research outputs found

    The 2005 AMI system for the transcription of speech in meetings

    Get PDF
    In this paper we describe the 2005 AMI system for the transcription\ud of speech in meetings used for participation in the 2005 NIST\ud RT evaluations. The system was designed for participation in the speech\ud to text part of the evaluations, in particular for transcription of speech\ud recorded with multiple distant microphones and independent headset\ud microphones. System performance was tested on both conference room\ud and lecture style meetings. Although input sources are processed using\ud different front-ends, the recognition process is based on a unified system\ud architecture. The system operates in multiple passes and makes use\ud of state of the art technologies such as discriminative training, vocal\ud tract length normalisation, heteroscedastic linear discriminant analysis,\ud speaker adaptation with maximum likelihood linear regression and minimum\ud word error rate decoding. In this paper we describe the system performance\ud on the official development and test sets for the NIST RT05s\ud evaluations. The system was jointly developed in less than 10 months\ud by a multi-site team and was shown to achieve very competitive performance

    Sparsity and adaptivity for the blind separation of partially correlated sources

    Get PDF
    Blind source separation (BSS) is a very popular technique to analyze multichannel data. In this context, the data are modeled as the linear combination of sources to be retrieved. For that purpose, standard BSS methods all rely on some discrimination principle, whether it is statistical independence or morphological diversity, to distinguish between the sources. However, dealing with real-world data reveals that such assumptions are rarely valid in practice: the signals of interest are more likely partially correlated, which generally hampers the performances of standard BSS methods. In this article, we introduce a novel sparsity-enforcing BSS method coined Adaptive Morphological Component Analysis (AMCA), which is designed to retrieve sparse and partially correlated sources. More precisely, it makes profit of an adaptive re-weighting scheme to favor/penalize samples based on their level of correlation. Extensive numerical experiments have been carried out which show that the proposed method is robust to the partial correlation of sources while standard BSS techniques fail. The AMCA algorithm is evaluated in the field of astrophysics for the separation of physical components from microwave data.Comment: submitted to IEEE Transactions on signal processin

    Model of the Classification of English Vowels by Spanish Speakers

    Full text link
    A number of models of single language vowel classification based on formant representations have been proposed. We propose a new model that explicitly predicts vowel perception by second language (L2) learners based on the phonological map of their native language (Ll). The model represents the vowels using polar coordinates in the F l-F2 formant space. Boundaries bisect the angles made by two adjacent category centroids. An L2 vowel is classified with the closest Ll vowel with a probability based on the angular difference of the L2 vowel and the Ll vowel boundary. The polar coordinate model is compared with other vowel classification models, such as the quadratic discriminant analysis method used by Hillenbrand and Gay vert [J. Speech Hear. Research, 36, 694-700, 1993] and the logistic regression analysis method adopted by Nearey [J. Phonetics, 18, 347-373, 1990]. All models were trained on Spanish vowel data and tested on English vowels. The results were compared with behavioral data obtained by Flege [Q. J. Exp. Psych., 43 A(3), 701-731 (1991)] for Spanish monolingual speakers identifying English vowels. The polar coordinate model outperformed the other models in matching its predictions most closely with the behavioral data.National Institute on Deafness and other Communication Disorders (R29 02852); Alfred P. Sloan Foundatio

    On adaptive decision rules and decision parameter adaptation for automatic speech recognition

    Get PDF
    Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing.published_or_final_versio
    corecore