Search CORE

5,667 research outputs found

Contextual confidence measures for continuous speech recognition

Author: Hernández-Abrego G
Mariño Acebal José Bernardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

This paper explores the repercussion of contextual information into confidence measuring for continuous speech recognition results. Our approach comprises three steps: to extract confidence predictors out of recognition results, to compile those predictors into confidence measures by means of a fuzzy inference system whose parameters have been estimated, directly from examples, with an evolutionary strategy and, finally, to upgrade the confidence measures by the inclusion of contextual information. Through experimentation with two different continuous speech application tasks, results show that the context re-scoring procedure improves the capabilities of confidence measures to discriminate between correct and incorrect recognition results for every level of thresholding, even when a rather simple method to add contextual information is considered.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Implementing a simple continuous speech recognition system on an FPGA

Author: Melnikoff Stephen Jonathan
Quigley Steven Francis
Russell Martin
Publication venue: IEEE
Publication date: 01/01/2002
Field of study

Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. We present an FPGA implementations of the decoder based on continuous hidden Markov models (HMMs) representing monophones, and demonstrate that it can process speech 75 times real time, using 45% of the slices of a Xilinx Virtex XCV100

University of Birmingham Research Portal

Discriminative training for continuous speech recognition

Author: Reichl W.
Ruske G.
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques

Universaar

Acronym

Network Training for Continuous Speech Recognition

Author: Alphonso Issac John
Publication venue: Scholars Junction
Publication date: 11/11/2003
Field of study

Spoken language processing is one of the oldest and most natural modes of information exchange between humans beings. For centuries, people have tried to develop machines that can understand and produce speech the way humans do so naturally. The biggest problem in our inability to model speech with computer programs and mathematics results from the fact that language is instinctive, whereas, the vocabulary and dialect used in communication are learned. Human beings are genetically equipped with the ability to learn languages, and culture imprints the vocabulary and dialect on each member of society. This thesis examines the role of pattern classification in the recognition of human speech, i.e., machine learning techniques that are currently being applied to the spoken language processing problem. The primary objective of this thesis is to create a network training paradigm that allows for direct training of multi-path models and alleviates the need for complicated systems and training recipes. A traditional trainer uses an expectation maximization (EM)based supervised training framework to estimate the parameters of a spoken language processing system. EM-based parameter estimation for speech recognition is performed using several complicated stages of iterative reestimation. These stages typically are prone to human error. The network training paradigm reduces the complexity of the training process while retaining the robustness of the EM-based supervised training framework. The hypothesis of this thesis is that the network training paradigm can achieve comparable recognition performance to a traditional trainer while alleviating the need for complicated systems and training recipes for spoken language processing systems

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Arabic automatic continuous speech recognition systems

Author: Abushariah Mohammad A. M.
Alqudah Assal Ali Mustafa
Khalifa Othman Omran
Zainuddin Roziati
Publication venue: 'IIUM Press'
Publication date: 01/01/2011
Field of study

MSA is the current formal linguistic standard of Arabic language, which is widely taught in schools and universities, and often used in the office and the media. MSA is also considered as the only acceptable form of Arabic language for all native speakers [I]. As recently, the research community has witnessed an improvement in the performance of ASR systems, there is an increasingly widespread use of this technology for several languages of the world. Similarly, research interests have grown significantly in the past few years for Arabic ASR research. It is noticed that Arabic ASR research is not only conducted and investigated by researchers in the Arab world, but also by many others located in different parts of the \vorld especially the western countries

The International Islamic University Malaysia Repository

Sperry Univac speech communications technology

Author: Medress Mark F.
Publication venue
Publication date
Field of study

Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described

NASA Technical Reports Server