14 research outputs found

    Identification of Non-Linguistic Speech Features

    Get PDF
    Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications where the spoken query is to be recognized without even prior knowledge of the language being spoken, for example, information centers in public places such as train stations and airports. Other applications may require accurate identification of the speaker for security reasons, including control of access to confidential information or for telephone-based transactions. Ideally, the speaker's identity can be verified continually during the transaction, in a manner completely transparent to the user. With these views in mind, this paper presents a unified approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. This technique is shown to be effective for text-independent language, sex, and speaker identification and can enable better and more friendly human-machine interaction. With 2s of speech, the language can be identified with better than 99 % accuracy. Error in sex-identification is about 1% on a per-sentence basis, and speaker identification accuracies of 98.5 % on TIMIT (168 speakers) and 99.2 % on BREF (65 speakers), were obtained with one utterance per speaker, and 100 % with 2 utterances for both corpora. An experiment using unsupervised adaptation for speaker identification on the 168 TIMIT speakers had the same identification accuracies obtained with supervised adaptation

    Speech Communication

    Get PDF
    Contains table of contents for Part IV, table of contents for Section 1 and reports on five research projects.Apple Computer, Inc.C.J. Lebel FellowshipNational Institutes of Health (Grant T32-NS07040)National Institutes of Health (Grant R01-NS04332)National Institutes of Health (Grant R01-NS21183)National Institutes of Health (Grant P01-NS23734)U.S. Navy / Naval Electronic Systems Command (Contract N00039-85-C-0254)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727

    Speech Communication

    Get PDF
    Contains reports on five research projects.C.J. Lebel FellowshipNational Institutes of Health (Grant 5 T32 NSO7040)National Institutes of Health (Grant 5 R01 NS04332)National Institutes of Health (Grant 5 R01 NS21183)National Institutes of Health (Grant 5 P01 NS13126)National Institutes of Health (Grant 1 PO1-NS23734)National Science Foundation (Grant BNS 8418733)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0254)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0341)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0290)National Institutes of Health (Grant RO1-NS21183), subcontract with Boston UniversityNational Institutes of Health (Grant 1 PO1-NS23734), subcontract with the Massachusetts Eye and Ear Infirmar

    Speech Recognizer Quality Assessment for Linguistic Engineering (SQALE)

    No full text
    The aim of the LRE-SQALE project (Speech recognizer Quality Assessment for Linguistic Engineering) is to experiment with establishing an evaluation paradigm in Europe for the assessment of large-vocabulary, continuous speech recognition systems in a multilingual environment. This 18 month project is will define and carry out the assessment experiments, paving the way for future projects with a larger scope and wider participation of European sites. The SQALE Consortium consists of a coordinator, the Institute for Human Factors at TNO, and three laboratories (CUED, LIMSI-CNRS, PHILIPS) who will evaluate their recognition systems using commonly agreed upon protocols, with the evaluation organized by the coordinating laboratory. Multiple sites will test their algorithms on the same database, so as to compare the merits of different methods, and each site will evaluate on at least two languages, so as to compare the relative difficulties of the languages, and the degree of independency of ..

    A Phone-based Approach to Non-Linguistic Speech Feature Identification

    No full text
    In this paper we present a general approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. The basic idea is to process the unknown speech signal by feature-specific phone model sets in parallel, and to hypothesize the feature value associated with the model set having the highest likelihood. This technique is shown to be effective for text-independent gender, speaker, and language identification. Text-independent speaker identification accuracies of 98.8% on TIMIT (168 speakers) and 99.2% on BREF (65 speakers), were obtained with one utterance per speaker, and 100% with 2 utterances for both corpora. Experiments in which speaker-specific models were estimated without using of the phonetic transcriptions for the TIMIT speakers had the same identification accuracies obtained with the use of the transcriptions. French/English language identification is better than 99% with 2s of read, laboratory speech. On spontaneous teleph..

    Speaker-Independent Phone Recognition Using BREF

    No full text
    A series of experiments on speaker-independent phone recognition of continuous speech have been carried out using the recently recorded BREF corpus. These experiments are the first to use this large corpus, and are meant to provide a baseline performance evaluation for vocabulary-independent phone recognition of French. The HMM-based recognizer was trained with hand-verified data from 43 speakers. Using 35 context-independent phone models, a baseline phone accuracy of 60% (no phone grammar) was obtained on an independent test set of 7635 phone segments from 19 new speakers. Including phone bigram probabilities as phonotactic constraints resulted in a performance of 63.5%. A phone accuracy of 68.6% was obtained with 428 context dependent models and the bigram phone language model. Vocabulary-independent word recognition results with no grammar are also reported for the same test data. INTRODUCTION This paper reports on a series of experiments for speakerindependent, continuous speech ..

    Cross-Lingual Experiments with Phone Recognition

    No full text
    This paper presents some of the recent research on speaker-independent continuous phone recognition for both French and English. The phone accuracy is assessed on the BREF corpus for French, and on the Wall Street Journal and TIMIT corpora for English. Cross-language differences concerning language properties are presented. It was found that French is easier to recognize at the phone level (the phone error for BREF is 23.6% vs. 30.1% for WSJ), but harder to recognize at the lexical level due to the larger number of homophones. Experiments with signal analysis indicate that a 4kHz signal bandwidth is sufficient for French, whereas 8kHz is needed for English. Phone recognition is a powerful technique for language, sex, and speaker identification. With 2s of speech, the languagecan be identified with better than 99% accuracy. Sex-identification for BREF and WSJ is errorfree. Speaker identification accuracies of 98.2% on TIMIT (462 speakers) and 99.1% on BREF (57 speakers), were obtained w..

    Experiments on Speaker-Independent Phone Recognition Using BREF

    No full text
    A series of experiments for speaker-independent, continuous speech phone recognition have been carried out using the recently recorded BREF corpus. Our experiments are the first to use this database, and are meant to provide a baseline performance evaluation for vocabulary independent phone recognition. The system was trained using hand-verified data from 43 speakers. Using 35 context-independent phone models, a baseline phone accuracy of 60% (no phone grammar) has been obtained on an independent test set of 7635 phone segmentsfrom 19 speakers. Including phone bigram probabilities as phonotactic constraints results in a performance of 63.5%. A phone accuracy of 68.6% (73.3 % correct) was obtained with 428 context dependent models. INTRODUCTION We report on a series of experiments for speakerindependent, continuous speech phone recognition of French, using the recently recorded BREF corpus[3, 4]. BREF was designed to provide speech data for the development of dictation machines, the e..

    Design Considerations and Text Selection for BREF, a large French read-speech corpus

    No full text
    BREF, a large read-speech corpus in French has been designed with several aims: to provide enough speech data to develop dictation machines, to provide data for evaluation of continuous speech recognition systems (both speaker-dependent and speaker-independent), and to provide a corpus of continuous speech to study phonological variations. This paper presents some of the design considerations of BREF, focusing on the text analysis and the selection of text materials. The texts to be read were selected from 4.6 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with an emphasis on maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. The goal is to obtain about 10,000 words (approximately 60-70 min.) of speech from each of 100 speakers, from different French dialects. INTRODUCTION One of the main obstacles to progress in continuous speech recognition has been the lack of sufficient speech m..

    BREF, a Large Vocabulary Spoken Corpus for French

    No full text
    This paper presents some of the design considerations of BREF, a large read-speech corpus for French. BREF was designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems (both speaker-dependent and speakerindependent) , and for the study of phonological variations. The texts to be read were selected from 5 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with selection criteria that emphasisized maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. Ninety speakers have been recorded, each providing between 5,000 and 10,000 words (approximately 40-70 min.) of speech. INTRODUCTION One of the main obstacles to progress in continuous speech recognition has been the lack of sufficient speech material for the training, development, and testing of algorithms and systems, as well as for the the study of speech..
    corecore