6,538 research outputs found
Developing attribute acquisition strategies in spoken dialogue systems via user simulation
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 159-169).A spoken dialogue system (SDS) is an application that supports conversational interaction with a human to perform some task. SDSs are emerging as an intuitive and efficient means for accessing information. A critical barrier to their widespread deployment remains in the form of communication breakdown at strategic points in the dialogue, often when the user tries to supply a named entity from a large or open vocabulary set. For example, a weather system might know several thousand cities, but there is no easy way to inform the user about what those cities are. The system will likely misrecognize any unknown city as some known city. The inability of a system to acquire an unknown value can lead to unpredictable behavior by the system, as well as by the user. This thesis presents a framework for developing attribute acquisition strategies with a simulated user. We specifically focus on the acquisition of unknown city names in a flight domain, through a spell-mode subdialogue. Collecting data from real users is costly in both time and resources. In addition, our goal is to focus on situations that tend to occur sporadically in real dialogues, depending on the domain and the user's experience in that domain.(cont.) Therefore, we chose to employ user simulation, which would allow us to generate a large number of dialogues, and to configure the input as desired in order to exercise specific strategies. We present a novel method of utterance generation for user simulation, that exploits an existing corpus of real user dialogues, but recombines the utterances using an example-based, template approach. Items of interest not in the corpus, such as foreign or unknown cities, can be included by splicing in synthesized speech. This method allows us to produce realistic utterances by retaining the structural variety of real user utterances, while introducing cities that can only be resolved via spelling. We also developed a model of generic dialogue management, allowing a developer to quickly specify interaction properties on a per-attribute basis. This model was used to assess the effectiveness of various combinations of dialogue strategies and simulated user behavior. Current approaches to user simulation typically model simulated utterances at the intention level, assuming perfect recognition and understanding. We employ speech to develop our strategies in the context of errors that occur naturally from recognition and understanding.(cont.) We use simulation to address two problems: the conflict problem requires the system to choose how to act when a new hypothesis for an attribute conflicts with its current belief, while the compliance problem requires the system to decide whether a user was compliant with a spelling request. Decision models were learned from simulated data, and were tested with real users, showing that the learned model significantly outperformed a heuristic model in choosing the "ideal" response to the conflict problem, with accuracies of 84.1% and 52.1%, respectively. The learned model to predict compliance achieved a respectable 96.3% accuracy. These results suggest that such models learned from simulated data can attain similar, if not better, performance in dialogues with real users.by Edward A. Filisko.Ph.D
Recognition of spelled out words in spoken queries
This disclosure describes techniques to enhance automated speech recognition by enabling automatic recognition of words spelled out by users. Machine learning techniques are utilized to detect explicit user intent to spell out a word as well as detect spelled out words without an explicitly stated user intent. If it is determined that the user is spelling a word, a spelling mode is triggered wherein received letters are concatenated together to form a word. If the user permits, data that includes the user context, audio of the word, audio of the user spelling out the word, and the textual representation of the word are obtained and utilized for training. The trained machine learning model is utilized in subsequent processing of user speech
Recommended from our members
An (An)Archive of Communication: Interactive Toys as Interlocutors
In this article, I analyze the Speak & Spell electronic toy (Texas Instruments, 1978) from the perspective of the communication that it enables. I argue that such interactive devices can be seen as archives of future communication. As a media archaeologist working with electronic toys I often find the conceptualizations of these devices as mere tools for playing unsatisfactory. They seem to share more characteristics with archives than with instruments. Like archives, interactive toys hold in themselves a predefined choice of informations and interactions, thus enabling certain modes of inquiry and discouraging others. Especially the electronic toys that draw on algorithms and data to present the user with an imitation of human communication are able to offer branching paths of interlocution within their domain or topic. In the first part of this article I offer an explanation of the toy\u27s technical structures that store speech and spelling data and enforce certain patterns of input and output between the device and its user. Then, I propose the use of the notion of the archive, or, alternatively, anarchive to describe the space of possibilities that defines this process of communication. In the last part I argue that the study of such algorithmic archives of possible communication needs to be based on interactive experimentation and can not be grounded in static recordings or descriptions alone
Language report for Catalan (English version)
The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin
Eighty Challenges Facing Speech Input/Output Technologies
ABSTRACT During the past three decades, we have witnessed remarkable progress in the development of speech input/output technologies. Despite these successes, we are far from reaching human capabilities of recognizing nearly perfectly the speech spoken by many speakers, under varying acoustic environments, with essentially unrestricted vocabulary. Synthetic speech still sounds stilted and robot-like, lacking in real personality and emotion. There are many challenges that will remain unmet unless we can advance our fundamental understanding of human communication -how speech is produced and perceived, utilizing our innate linguistic competence. This paper outlines some of these challenges, ranging from signal presentation and lexical access to language understanding and multimodal integration, and speculates on how these challenges could be met
- …