Search CORE

360 research outputs found

Natural language processing in speech understanding systems

Author: Holmes Geoffrey
Publication venue: Department of Computer Science, University of Waikato
Publication date: 01/01/1992
Field of study

Speech understanding systems (SUS's) came of age in late 1971 as a result of a five year development programme instigated by the Information Processing Technology Office of the Advanced Research Projects Agency (ARPA) of the Department of Defense in the United States. The aim of the programme was to research and develop practical man-machine communication systems. It has been argued since, that the main contribution of this project was not in the development of speech science, but in the development of artificial intelligence. That debate is beyond the scope of this paper, though no one would question the fact that the field to benefit most within artificial intelligence as a result of this programme is natural language understanding. More recent projects of a similar nature, such as projects in the United Kingdom's ALVEY programme and Europe's ESPRIT programme have added further developments to this important field. This paper presents a review of some of the natural language processing techniques used within speech understanding systems. In particular, techniques for handling syntactic, semantic and pragmatic information are discussed. They are integrated into SUS's as knowledge sources. The most common application of these systems is to provide an interface to a database. The system has to perform a dialogue with a user who is generally unknown to the system. Typical examples are train and aeroplane timetable enquiry systems, travel management systems and document retrieval systems

Research Commons@Waikato

Voice technology and BBN

Author: Wolf Jared J.
Publication venue
Publication date
Field of study

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described

NASA Technical Reports Server

Research on speech understanding and related areas at SRI

Author: Walker Donald E.
Publication venue
Publication date
Field of study

Research capabilities on speech understanding, speech recognition, and voice control are described. Research activities and the activities which involve text input rather than speech are discussed

NASA Technical Reports Server

Sperry Univac speech communications technology

Author: Medress Mark F.
Publication venue
Publication date
Field of study

Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described

NASA Technical Reports Server

Filler model based confidence measures for spoken dialogue systems: a case study for Turkish

Author: Akyol Aydın
Erdoğan Hakan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

Because of the inadequate performance of speech recognition systems, an accurate confidence scoring mechanism should be employed to understand user requests correctly. To determine a confidence score for a hypothesis, certain confidence features are combined. The performance of filler-model based confidence features have been investigated. Five types of filler model networks were defined: triphone-network; phone-network; phone-class network; 5-state catch-all model; 3-state catch-all model. First, all models were evaluated in a Turkish speech recognition task in terms of their ability to tag correctly (recognition-error or correct) recognition hypotheses. The best performance was obtained from the triphone recognition network. Then, the performances of reliable combinations of these models were investigated and it was observed that certain combinations of filler models could significantly improve the accuracy of the confidence annotatio

Sabanci University Research Database

Semantic Processing of Out-Of-Vocabulary Words in a Spoken Dialogue System

Author: Aretoulaki Maria
Boros Manuela
Gallwitz Florian
Niemann Heinrich
Noeth Elmar
Publication venue
Publication date: 01/01/1997
Field of study

One of the most important causes of failure in spoken dialogue systems is usually neglected: the problem of words that are not covered by the system's vocabulary (out-of-vocabulary or OOV words). In this paper a methodology is described for the detection, classification and processing of OOV words in an automatic train timetable information system. The various extensions that had to be effected on the different modules of the system are reported, resulting in the design of appropriate dialogue strategies, as are encouraging evaluation results on the new versions of the word recogniser and the linguistic processor.Comment: 4 pages, 2 eps figures, requires LaTeX2e, uses eurospeech.sty and epsfi

arXiv.org e-Print Archive

CiteSeerX

Utilizing Statistical Dialogue Act Processing in Verbmobil

Author: Maier Elisabeth
Reithinger Norbert
Publication venue
Publication date: 01/01/1995
Field of study

In this paper, we present a statistical approach for dialogue act processing in the dialogue component of the speech-to-speech translation system \vm. Statistics in dialogue processing is used to predict follow-up dialogue acts. As an application example we show how it supports repair when unexpected dialogue states occur.Comment: 6 pages; compressed and uuencoded postscript file; to appear in ACL-9

arXiv.org e-Print Archive

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

Author: Stehwien Sabrina
Vu Ngoc Thang
Publication venue
Publication date: 02/06/2017
Field of study

This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Comparing Human and Machine Errors in Conversational Speech Transcription

Author: Droppo Jasha
Stolcke Andreas
Publication venue
Publication date: 29/08/2017
Field of study

Recent work in automatic recognition of conversational telephone speech (CTS) has achieved accuracy levels comparable to human transcribers, although there is some debate how to precisely quantify human performance on this task, using the NIST 2000 CTS evaluation set. This raises the question what systematic differences, if any, may be found differentiating human from machine transcription errors. In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline. We find that the most frequent substitution, deletion and insertion error types of both outputs show a high degree of overlap. The only notable exception is that the automatic recognizer tends to confuse filled pauses ("uh") and backchannel acknowledgments ("uhhuh"). Humans tend not to make this error, presumably due to the distinctive and opposing pragmatic functions attached to these words. Furthermore, we quantify the correlation between human and machine errors at the speaker level, and investigate the effect of speaker overlap between training and test data. Finally, we report on an informal "Turing test" asking humans to discriminate between automatic and human transcription error cases

arXiv.org e-Print Archive

Crossref