6 research outputs found

    Using Prosodic Information to Constrain Language Models for Spoken Dialogue

    Get PDF
    We present work intended to improve speech recognition performance for computer dialogue by taking into account the way that dialogue context and intonational tune interact to limit the possibilities for what an utterance might be. We report on the extra constraint achieved in a bigram language model, expressed in terms of entropy, by using separate submodels for different sorts of dialogue acts, and trying to predict which submodel to apply by analysis of the intonation of the sentence being recognise

    Empirical Studies in Discourse

    Get PDF
    Introduction Computationaltheories of discourse are concerned with the context-based interpretation or generation of discourse phenomena in text and dialogue. In the past, research in this area focused on specifying the mechanismsunderlying particular discourse phenomena; the models proposed were often motivated by a few constructed examples. While this approach led to many theoretical advances, models developed in this manner are difficult to evaluate because it is hard to tell whether they generalize beyond the particular examples used to motivate them. Recently however the field has turned to issues of robustness and the coverage of theories of particular phenomena with respect to specific types of data. This new empirical focus is supported by several recent advances: an increasing theoretical consensus on discourse models; a large amount of online dialogue and textual corpora available; and improvements in component technologies and tools for building and testing discours

    Characterizing and recognizing spoken corrections in human-computer dialog

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 103-106).Miscommunication in human-computer spoken language systems is unavoidable. Recognition failures on the part of the system necessitate frequent correction attempts by the user. Unfortunately and counterintuitively, users' attempts to speak more clearly in the face of recognition errors actually lead to decreased recognition accuracy. The difficulty of correcting these errors, in turn, leads to user frustration and poor assessments of system quality. Most current approaches to identifying corrections rely on detecting violations of task or belief models that are ineffective where such constraints are weak and recognition results inaccurate or unavailable. In contrast, the approach pursued in this thesis, in contrast, uses the acoustic contrasts between original inputs and repeat corrections to identify corrections in a more content- and context-independent fashion. This thesis quantifies and builds upon the observation that suprasegmental features, such as duration, pause, and pitch, play a crucial role in distinguishing corrections from other forms of input to spoken language systems. These features can also be used to identify spoken corrections and explain reductions in recognition accuracy for these utterances. By providing a detailed characterization of acoustic-prosodic changes in corrections relative to original inputs in a voice-only system, this thesis contributes to natural language processing and spoken language understanding. We present a treatment of systematic acoustic variability in speech recognizer input as a source of new information, to interpret the speaker's corrective intent, rather than simply as noise or user error. We demonstrate the application of a machine-learning technique, decision trees, for identifying spoken corrections and achieve accuracy rates close to human levels of performance for corrections of misrecognition errors, using acoustic-prosodic information. This process is simple and local and depends neither on perfect transcription of the recognition string nor complex reasoning based on the full conversation. We further extend the conventional analysis of speaking styles beyond a 'read' versus 'conversational' contrast to extreme clear speech, describing divergence from phonological and durational models for words in this style.by Gina-Anne Levow.Ph.D

    Bayesian Approaches to Uncertainty in Speech Processing

    Get PDF

    Statistical language modelling of dialogue material in the British national corpus.

    Get PDF
    Statistical language modelling may not only be used to uncover the patterns which underlie the composition of utterances and texts, but also to build practical language processing technology. Contemporary language applications in automatic speech recognition, sentence interpretation and even machine translation exploit statistical models of language. Spoken dialogue systems, where a human user interacts with a machine via a speech interface in order to get information, make bookings, complaints, etc., are example of such systems which are now technologically feasible. The majority of statistical language modelling studies to date have concentrated on written text material (or read versions thereof). However, it is well-known that dialogue is significantly different from written text in its lexical content and sentence structure. Furthermore, there are expected to be significant logical, thematic and lexical connections between successive turns within a dialogue, but "turns" are not generally meaningful in written text. There is therefore a need for statistical language modeling studies to be performed on dialogue, particularly with a longer-term aim to using such models in human-machine dialogue interfaces. In this thesis, I describe the studies I have carried out on statistically modelling the dialogue material within the British National Corpus (BNC) - a very large corpus of modern British English compiled during the 1990s. This thesis presents a general introductory survey of the field of automatic speech recognition. This is followed by a general introduction to some standard techniques of statistical language modelling which will be employed later in the thesis. The structure of dialogue is discussed using some perspectives from linguistic theory, and reviews some previous approaches (not necessarily statistical) to modelling dialogue. Then a qualitative description is given of the BNC and the dialogue data within it, together with some descriptive statistics relating to it and results from constructing simple trigram language models for both dialogue and text data. The main part of the thesis describes experiments on the application of statistical language models based on word caches, word "trigger" pairs, and turn clustering to the dialogue data. Several different approaches are used for each type of model. An analysis of the strengths and weaknesses of these techniques is then presented. The results of the experiments lead to a better understanding of how statistical language modelling might be applied to dialogue for the benefit of future language technologies

    The Effects of Computer Mediated Communication on the Processes of Communication and Collaboration

    Get PDF
    This thesis investigates the effects of computer-mediated communication upon collaborative problem solving. The results of three studies are presented, which explore the range of channels of communication afforded by two dissimilar forms of computer-mediated communication. The first study explores the effects of an interactive text-based form of computer-mediated communication, which provides users with a very restricted range of channels. Studies two and three examine the effects of collaborating in a video-mediated context, a technologically sophisticated communication system that affords an array of channels of communication more similar to face-to-face interactions. The effects of these communicative contexts are assessed using a multi-faceted approach. This method of evaluation is based upon analysis of task performance, the structure of the interactions, and measures of the process and content of communication. The findings show that the novice users of these computer-mediated contexts can achieve effective communication and collaboration, but the ease and pace with which this is accomplished varies with communicative context. Users of the highly constrained text-based system initially performed less well on the collaborative tasks, but with experience adapted to the context in appropriate ways. Participants in the video-mediated conditions appeared to adjust quickly to this context. Subtle differences in the structure, process and content of their interactions show that they also had to make allowances for the restraints imposed by the technologically mediated context. These results are discussed within the frame-work of a collaborative model of communication
    corecore