6,538 research outputs found

    Developing attribute acquisition strategies in spoken dialogue systems via user simulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 159-169).A spoken dialogue system (SDS) is an application that supports conversational interaction with a human to perform some task. SDSs are emerging as an intuitive and efficient means for accessing information. A critical barrier to their widespread deployment remains in the form of communication breakdown at strategic points in the dialogue, often when the user tries to supply a named entity from a large or open vocabulary set. For example, a weather system might know several thousand cities, but there is no easy way to inform the user about what those cities are. The system will likely misrecognize any unknown city as some known city. The inability of a system to acquire an unknown value can lead to unpredictable behavior by the system, as well as by the user. This thesis presents a framework for developing attribute acquisition strategies with a simulated user. We specifically focus on the acquisition of unknown city names in a flight domain, through a spell-mode subdialogue. Collecting data from real users is costly in both time and resources. In addition, our goal is to focus on situations that tend to occur sporadically in real dialogues, depending on the domain and the user's experience in that domain.(cont.) Therefore, we chose to employ user simulation, which would allow us to generate a large number of dialogues, and to configure the input as desired in order to exercise specific strategies. We present a novel method of utterance generation for user simulation, that exploits an existing corpus of real user dialogues, but recombines the utterances using an example-based, template approach. Items of interest not in the corpus, such as foreign or unknown cities, can be included by splicing in synthesized speech. This method allows us to produce realistic utterances by retaining the structural variety of real user utterances, while introducing cities that can only be resolved via spelling. We also developed a model of generic dialogue management, allowing a developer to quickly specify interaction properties on a per-attribute basis. This model was used to assess the effectiveness of various combinations of dialogue strategies and simulated user behavior. Current approaches to user simulation typically model simulated utterances at the intention level, assuming perfect recognition and understanding. We employ speech to develop our strategies in the context of errors that occur naturally from recognition and understanding.(cont.) We use simulation to address two problems: the conflict problem requires the system to choose how to act when a new hypothesis for an attribute conflicts with its current belief, while the compliance problem requires the system to decide whether a user was compliant with a spelling request. Decision models were learned from simulated data, and were tested with real users, showing that the learned model significantly outperformed a heuristic model in choosing the "ideal" response to the conflict problem, with accuracies of 84.1% and 52.1%, respectively. The learned model to predict compliance achieved a respectable 96.3% accuracy. These results suggest that such models learned from simulated data can attain similar, if not better, performance in dialogues with real users.by Edward A. Filisko.Ph.D

    Recognition of spelled out words in spoken queries

    Get PDF
    This disclosure describes techniques to enhance automated speech recognition by enabling automatic recognition of words spelled out by users. Machine learning techniques are utilized to detect explicit user intent to spell out a word as well as detect spelled out words without an explicitly stated user intent. If it is determined that the user is spelling a word, a spelling mode is triggered wherein received letters are concatenated together to form a word. If the user permits, data that includes the user context, audio of the word, audio of the user spelling out the word, and the textual representation of the word are obtained and utilized for training. The trained machine learning model is utilized in subsequent processing of user speech

    An (An)Archive of Communication: Interactive Toys as Interlocutors

    Get PDF

    Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

    Get PDF

    Language report for Catalan (English version)

    Get PDF
    The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin

    Eighty Challenges Facing Speech Input/Output Technologies

    Get PDF
    ABSTRACT During the past three decades, we have witnessed remarkable progress in the development of speech input/output technologies. Despite these successes, we are far from reaching human capabilities of recognizing nearly perfectly the speech spoken by many speakers, under varying acoustic environments, with essentially unrestricted vocabulary. Synthetic speech still sounds stilted and robot-like, lacking in real personality and emotion. There are many challenges that will remain unmet unless we can advance our fundamental understanding of human communication -how speech is produced and perceived, utilizing our innate linguistic competence. This paper outlines some of these challenges, ranging from signal presentation and lexical access to language understanding and multimodal integration, and speculates on how these challenges could be met
    corecore