Search CORE

289 research outputs found

Towards an automatic speech recognition system for use by deaf students in lectures

Author: Collingham Russell James
Publication venue
Publication date: 01/01/1994
Field of study

According to the Royal National Institute for Deaf people there are nearly 7.5 million hearing-impaired people in Great Britain. Human-operated machine transcription systems, such as Palantype, achieve low word error rates in real-time. The disadvantage is that they are very expensive to use because of the difficulty in training operators, making them impractical for everyday use in higher education. Existing automatic speech recognition systems also achieve low word error rates, the disadvantages being that they work for read speech in a restricted domain. Moving a system to a new domain requires a large amount of relevant data, for training acoustic and language models. The adopted solution makes use of an existing continuous speech phoneme recognition system as a front-end to a word recognition sub-system. The subsystem generates a lattice of word hypotheses using dynamic programming with robust parameter estimation obtained using evolutionary programming. Sentence hypotheses are obtained by parsing the word lattice using a beam search and contributing knowledge consisting of anti-grammar rules, that check the syntactic incorrectness’ of word sequences, and word frequency information. On an unseen spontaneous lecture taken from the Lund Corpus and using a dictionary containing "2637 words, the system achieved 815% words correct with 15% simulated phoneme error, and 73.1% words correct with 25% simulated phoneme error. The system was also evaluated on 113 Wall Street Journal sentences. The achievements of the work are a domain independent method, using the anti- grammar, to reduce the word lattice search space whilst allowing normal spontaneous English to be spoken; a system designed to allow integration with new sources of knowledge, such as semantics or prosody, providing a test-bench for determining the impact of different knowledge upon word lattice parsing without the need for the underlying speech recognition hardware; the robustness of the word lattice generation using parameters that withstand changes in vocabulary and domain

Durham e-Theses

Construction of Large Scale Isolated Word Speech Corpus in Bangla

Author: Md. Farukuzzaman Khan
Publication venue: Global Journals Inc. (US)
Publication date: 15/05/2018
Field of study

A new speech corpus of isolated words in Bangla language has been recorded including high frequent words from a text corpus BdNC01 It has been specifically designed for various research activities related to speaker-independent Bangla speech recognition The database consists of speech of 100 speakers each of them speaking 1081 words Another 50 new speakers were employed to speak all the list of speech to construct a test database Every utterance was repeated 5 times in different days to avoid time variation of speaker property The total 400 hours of recording makes the corpora largest in its type size and language domain This paper describes the motivation for the corpora and the processes undertaken in its construction The paper concludes with the usability of the corpu

Global Journal of Computer Science and Technology (GJCST)

Recommended from our members

The Challenge of Spoken Language Systems: Research Directions for the Nineties

Author: Atlas Les
Beckman Mary
Biermann Alan
Bush Marcia
Clements Mark
Cohen Jordan
Cole Ron
Garcia Oscar
Hanson Brian
Hermansky Hynek
Hirschman Lynette
Levinson Steve
McKeown Kathleen
Morgan Nelson
Novick David G.
Ostendorf Mari
Oviatt Sharon
Price Patti
Silverman Harvey
Spitz Judy
Waibel Alex
Weinstein Clifford
Zahorian Steve
Zue Victor
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1995
Field of study

A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area

Columbia University Academic Commons

Recommended from our members

The Challenge of Spoken Language Systems: Research Directions for the Nineties

Author: McKeown Kathleen
Cole Ron
Hirschman Lynette
Atlas Les
Beckman Mary
Biermann Alan
Bush Marcia
Clements Mark
Cohen Jordan
Garcia Oscar
Hanson Brian
Hermansky Hynek
Levinson Steve
Morgan Nelson
Novick David G.
Ostendorf Mari
Oviatt Sharon
Price Patti
Silverman Harvey
Spitz Judy
Waibel Alex
Weinstein Clifford
Zahorian Steve
Zue Victor
Publication venue
Publication date: 01/01/1995
Field of study

Columbia University Academic Commons

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Fast speaker independent large vocabulary continuous speech recognition [online]

Author: Woszczyna Monika
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/1998
Field of study

KITopen

The 1995 ABBOT LVCSR system for multiple unknown microphones

Author: Kershaw Dan
Renals Steve
Robinson Tony
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

ABBOT is a hybrid (connectionist-hidden Markov model) large-vocabulary speech recognition (LVCSR) system, developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes, which are used as observation probabilities within an HMM. This paper describes the system which participated in the November 1995 ARPA Hub-3 multiple unknown microphones (MUM) evaluation of continuous speech recognition systems, under the guise of the CU-CON system. The emphasis of the paper is on the changes made to the 1994 ABBOT system, specifically to accommodate the H3 task. This includes improved acoustic modelling using limited word-internal context-dependent models, training on the Wall Street Journal secondary channel database, and using the linear input network for speaker and environmental adaptation. Experimental results are reported for various test and development sets from the November 1994 and 1995 ARPA benchmark tests

CiteSeerX

Crossref

Edinburgh Research Archive

Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

Author: Parihar Naveen
Publication venue: Scholars Junction
Publication date: 10/11/2003
Field of study

Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noiseree environments is comparable to that achieved by human transcribers. This advancement in automatic speech recognition technology along with an increase in the compute power of mobile devices, standardization of communication protocols, and the explosion in the popularity of the mobile devices, has created an interest in flexible voice interfaces for mobile devices. However, speech recognition performance degrades dramatically in mobile environments which are inherently noisy. In the recent past, a great amount of effort has been spent on the development of front ends based on advanced noise robust approaches. The primary objective of this thesis was to analyze the performance of two advanced front ends, referred to as the QIO and MFA front ends, on a speech recognition task based on the Wall Street Journal database. Though the advanced front ends are shown to achieve a significant improvement over an industry-standard baseline front end, this improvement is not operationally significant. Further, we show that the results of this evaluation were not significantly impacted by suboptimal recognition system parameter settings. Without any front end-specific tuning, the MFA front end outperforms the QIO front end by 9.6% relative. With tuning, the relative performance gap increases to 15.8%. Finally, we also show that mismatched microphone and additive noise evaluation conditions resulted in a significant degradation in performance for both front ends

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository