18 research outputs found
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Developing a corpus-based grammar model within a continuous commercial speech recognition package
This paper is derived from experiments with a commercial ’off-the-shelf’ continuous speech recognition system, applied to the apparently restricted domain of Air Traffic Control (ATC) for light aircraft. The system is required to transcribe key sub-phrases in a transmission by the ATC to a particular aircraft, the commercial speech recognition system providing the main recognition component. After the development of a corpus of transmissions, it was realised that key information is often interspersed with unconstrained English. Initial attempts focused on using a wildcard mechanism for the non-key sub- phrases. The mechanism, however, proved to be valuable only in simplistic grammars due to its overgenerative nature. The speech recognition system showed us that whilst useful mechanisms are provided, such as the wildcard mechanism, they tend to make over-simplistic assumptions about English grammar and dialogue structure
Towards an automatic speech recognition system for use by deaf students in lectures
According to the Royal National Institute for Deaf people there are nearly 7.5 million hearing-impaired people in Great Britain. Human-operated machine transcription systems, such as Palantype, achieve low word error rates in real-time. The disadvantage is that they are very expensive to use because of the difficulty in training operators, making them impractical for everyday use in higher education. Existing automatic speech recognition systems also achieve low word error rates, the disadvantages being that they work for read speech in a restricted domain. Moving a system to a new domain requires a large amount of relevant data, for training acoustic and language models. The adopted solution makes use of an existing continuous speech phoneme recognition system as a front-end to a word recognition sub-system. The subsystem generates a lattice of word hypotheses using dynamic programming with robust parameter estimation obtained using evolutionary programming. Sentence hypotheses are obtained by parsing the word lattice using a beam search and contributing knowledge consisting of anti-grammar rules, that check the syntactic incorrectness’ of word sequences, and word frequency information. On an unseen spontaneous lecture taken from the Lund Corpus and using a dictionary containing "2637 words, the system achieved 815% words correct with 15% simulated phoneme error, and 73.1% words correct with 25% simulated phoneme error. The system was also evaluated on 113 Wall Street Journal sentences. The achievements of the work are a domain independent method, using the anti- grammar, to reduce the word lattice search space whilst allowing normal spontaneous English to be spoken; a system designed to allow integration with new sources of knowledge, such as semantics or prosody, providing a test-bench for determining the impact of different knowledge upon word lattice parsing without the need for the underlying speech recognition hardware; the robustness of the word lattice generation using parameters that withstand changes in vocabulary and domain
Recommended from our members
The Challenge of Spoken Language Systems: Research Directions for the Nineties
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area
Recommended from our members
The Challenge of Spoken Language Systems: Research Directions for the Nineties
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area
End-to-end named entity recognition for spoken Finnish
Named entity recognition is a natural language processing task in which the system tries to find named entities and classify them in predefined categories. The categories can vary, depending on the domain in which they are going to be used but some of the most common include: person, location, organization, date and product. Named entity recognition is an integral part of other large natural language processing tasks, such as information retrieval, text summarization, machine translation, and question answering.
Doing named entity recognition is a difficult task due to the lack of annotated data for certain languages or domains. Named entity ambiguity is another challenging aspect that arises when doing named entity recognition. Often times, a word can represent a person, organization, product, or any other category, depending on the context it appears in.
Spoken data, which can be the output of a speech recognition system, imposes additional challenges to the named entity recognition system. Named entities are often capitalized and the system learns to rely on capitalization in order to detect the entities, which is neglected in the speech recognition output.
The standard way of doing named entity recognition from speech involves a pipeline approach of two systems. First, a speech recognition system transcribes the speech and generates the transcripts, after which a named entity recognition system annotates the transcripts with the named entities. Since the speech recognition system is not perfect and makes errors, those errors are propagated to the named entity recognition system, which is hard to recover from.
In this thesis, we present two approaches of doing named entity recognition from Finnish speech in an end-to-and manner, where one system generates the transcripts and the annotations. We will explore the strengths and weaknesses of both approaches and see how they compare to the standard pipeline approach