41 research outputs found

    Language-based multimedia information retrieval

    Get PDF
    This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

    Speaker Verification Over The Telephone

    No full text
    In this paper we present a study on speaker verification using telephone speech and for two operational modes, i.e. text-dependent and text-independent speaker verification. A statistical modeling approach is taken, where for text-independent verification the talker is viewed as a source of phones, modeled by a fully connected Markov chain and for text-dependent verification, a left-to-right HMM is built by concatenating the phone models corresponding to the transcription. A series of experiments were carried out on a large telephone corpus recorded specifically for speaker verification algorithm development assessing performance as a function of the type and amount of data used for training and for verification. Experimental results are presented for both read and spontaneous speech. On this data, the lowest equal error rate is 1% for the text-dependent mode when 2 trials are allowed per attempt and with a minimum of 1.5s of speech per trial

    Study on Cross-Lingual Adaptation of a Czech LVCSR System towards Slovak

    No full text

    The Limsi Arise System for Train Travel Information

    No full text
    In the context of the LE-3 ARISE project we have been developing a dialog system for vocal access to rail travel information. The system provides schedule information for the main French intercity connections, as well as, simulated fares and reservations, reductions and services. Our goal is to obtain high dialog success rates with a very open dialog structure, where the user is free to ask any question or to provide any information at any point in time. In order to improve performance with such an open dialog strategy, we make use of implicit confirmation using the callers wording (when possible), and change to a more constrained dialog level when the dialog is not going well. In addition to own assessment, the prototype system undergoes periodic user evaluations carried out by the our partners at the French Railways. INTRODUCTION The LE-3 ARISE (Automatic Railway Information Systems for Europe) project aims a developing prototype telephone information services for rail travel infor..

    Developments in Large Vocabulary Dictation: The LIMSI Nov94 NAB System

    No full text
    In this paper we report on our development work in large vocabulary, American English continuous speech dictation on the ARPA NAB task in preparation for the November 1994 evaluation. We have experimented with (1) alternative analyses for the acoustic front end, (2) the use of an enlarged vocabulary of 65k words so as to reduce the number of errors due to out-of-vocabulary words, (3) extensions to the lexical representation, (4) the use of additional acoustic training data, and (5) modification of the acoustic models for telephone speech. The recognizer was evaluated on Hubs 1 and 2 of the fall 1994 ARPA NAB CSR Hub and Spoke Benchmark test. Experimental results on development and evaluation test data are given, as well as an analysis of the errors on the development data. 1. Introduction Research in large vocabulary speaker-independent dictation at LIMSI[5, 6] makes use of large newspaper-based corpora such as the ARPA Wall Street Journal-based Continuous Speech Recognition corpus (..

    Unsupervised speaker adaptation using reference speaker weighting

    No full text
    Recently, we revisited the fast adaptation method called reference speaker weighting (RSW), and suggested a few modifications. We then showed that the algorithmically simplest technique actually outperformed conventional adaptation techniques like MAP and MLLR for 5- or 10-second supervised adaptation on the Wall Street Journal 5K task. In this paper, we would like to further investigate the performance of RSW in unsupervised adaptation mode, which is the more natural way of doing adaptation in practice. Moreover, various analyses were carried out on the reference speakers computed by the method. © 2006 Springer-Verlag Berlin/Heidelberg
    corecore