360 research outputs found
Natural language processing in speech understanding systems
Speech understanding systems (SUS's) came of age in late 1971 as a result of a five year development programme instigated by the Information Processing Technology Office of the Advanced Research Projects Agency (ARPA) of the Department of Defense in the United States. The aim of the programme was to research and develop practical man-machine communication systems. It has been argued since, that the main contribution of this project was not in the development of speech science, but in the development of artificial intelligence. That debate is beyond the scope of this paper, though no one would question the fact that the field to benefit most within artificial intelligence as a result of this programme is natural language understanding. More recent projects of a similar nature, such as projects in the United Kingdom's ALVEY programme and Europe's ESPRIT programme have added further developments to this important field. This paper presents a review of some of the natural language processing techniques used within speech understanding systems. In particular, techniques for handling syntactic, semantic and pragmatic information are discussed. They are integrated into SUS's as knowledge sources. The most common application of these systems is to provide an interface to a database. The system has to perform a dialogue with a user who is generally unknown to the system. Typical examples are train and aeroplane timetable enquiry systems, travel management systems and document retrieval systems
Voice technology and BBN
The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described
Research on speech understanding and related areas at SRI
Research capabilities on speech understanding, speech recognition, and voice control are described. Research activities and the activities which involve text input rather than speech are discussed
Sperry Univac speech communications technology
Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described
Filler model based confidence measures for spoken dialogue systems: a case study for Turkish
Because of the inadequate performance of speech recognition systems, an accurate confidence scoring mechanism should be employed to understand user requests correctly. To determine a confidence score for a hypothesis, certain confidence features are combined. The performance of filler-model based confidence features have been investigated. Five types of filler model networks were defined: triphone-network; phone-network; phone-class network; 5-state catch-all model; 3-state catch-all model. First, all models were evaluated in a Turkish speech recognition task in terms of their ability to tag correctly (recognition-error or correct) recognition hypotheses. The best performance was obtained from the triphone recognition network. Then, the performances of reliable combinations of these models were investigated and it was observed that certain combinations of filler models could significantly improve the accuracy of the confidence annotatio
Semantic Processing of Out-Of-Vocabulary Words in a Spoken Dialogue System
One of the most important causes of failure in spoken dialogue systems is
usually neglected: the problem of words that are not covered by the system's
vocabulary (out-of-vocabulary or OOV words). In this paper a methodology is
described for the detection, classification and processing of OOV words in an
automatic train timetable information system. The various extensions that had
to be effected on the different modules of the system are reported, resulting
in the design of appropriate dialogue strategies, as are encouraging evaluation
results on the new versions of the word recogniser and the linguistic
processor.Comment: 4 pages, 2 eps figures, requires LaTeX2e, uses eurospeech.sty and
epsfi
Utilizing Statistical Dialogue Act Processing in Verbmobil
In this paper, we present a statistical approach for dialogue act processing
in the dialogue component of the speech-to-speech translation system \vm.
Statistics in dialogue processing is used to predict follow-up dialogue acts.
As an application example we show how it supports repair when unexpected
dialogue states occur.Comment: 6 pages; compressed and uuencoded postscript file; to appear in
ACL-9
Prosodic Event Recognition using Convolutional Neural Networks with Context Information
This paper demonstrates the potential of convolutional neural networks (CNN)
for detecting and classifying prosodic events on words, specifically pitch
accents and phrase boundary tones, from frame-based acoustic features. Typical
approaches use not only feature representations of the word in question but
also its surrounding context. We show that adding position features indicating
the current word benefits the CNN. In addition, this paper discusses the
generalization from a speaker-dependent modelling approach to a
speaker-independent setup. The proposed method is simple and efficient and
yields strong results not only in speaker-dependent but also
speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur
Comparing Human and Machine Errors in Conversational Speech Transcription
Recent work in automatic recognition of conversational telephone speech (CTS)
has achieved accuracy levels comparable to human transcribers, although there
is some debate how to precisely quantify human performance on this task, using
the NIST 2000 CTS evaluation set. This raises the question what systematic
differences, if any, may be found differentiating human from machine
transcription errors. In this paper we approach this question by comparing the
output of our most accurate CTS recognition system to that of a standard speech
transcription vendor pipeline. We find that the most frequent substitution,
deletion and insertion error types of both outputs show a high degree of
overlap. The only notable exception is that the automatic recognizer tends to
confuse filled pauses ("uh") and backchannel acknowledgments ("uhhuh"). Humans
tend not to make this error, presumably due to the distinctive and opposing
pragmatic functions attached to these words. Furthermore, we quantify the
correlation between human and machine errors at the speaker level, and
investigate the effect of speaker overlap between training and test data.
Finally, we report on an informal "Turing test" asking humans to discriminate
between automatic and human transcription error cases
- …