604 research outputs found

    Developing attribute acquisition strategies in spoken dialogue systems via user simulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 159-169).A spoken dialogue system (SDS) is an application that supports conversational interaction with a human to perform some task. SDSs are emerging as an intuitive and efficient means for accessing information. A critical barrier to their widespread deployment remains in the form of communication breakdown at strategic points in the dialogue, often when the user tries to supply a named entity from a large or open vocabulary set. For example, a weather system might know several thousand cities, but there is no easy way to inform the user about what those cities are. The system will likely misrecognize any unknown city as some known city. The inability of a system to acquire an unknown value can lead to unpredictable behavior by the system, as well as by the user. This thesis presents a framework for developing attribute acquisition strategies with a simulated user. We specifically focus on the acquisition of unknown city names in a flight domain, through a spell-mode subdialogue. Collecting data from real users is costly in both time and resources. In addition, our goal is to focus on situations that tend to occur sporadically in real dialogues, depending on the domain and the user's experience in that domain.(cont.) Therefore, we chose to employ user simulation, which would allow us to generate a large number of dialogues, and to configure the input as desired in order to exercise specific strategies. We present a novel method of utterance generation for user simulation, that exploits an existing corpus of real user dialogues, but recombines the utterances using an example-based, template approach. Items of interest not in the corpus, such as foreign or unknown cities, can be included by splicing in synthesized speech. This method allows us to produce realistic utterances by retaining the structural variety of real user utterances, while introducing cities that can only be resolved via spelling. We also developed a model of generic dialogue management, allowing a developer to quickly specify interaction properties on a per-attribute basis. This model was used to assess the effectiveness of various combinations of dialogue strategies and simulated user behavior. Current approaches to user simulation typically model simulated utterances at the intention level, assuming perfect recognition and understanding. We employ speech to develop our strategies in the context of errors that occur naturally from recognition and understanding.(cont.) We use simulation to address two problems: the conflict problem requires the system to choose how to act when a new hypothesis for an attribute conflicts with its current belief, while the compliance problem requires the system to decide whether a user was compliant with a spelling request. Decision models were learned from simulated data, and were tested with real users, showing that the learned model significantly outperformed a heuristic model in choosing the "ideal" response to the conflict problem, with accuracies of 84.1% and 52.1%, respectively. The learned model to predict compliance achieved a respectable 96.3% accuracy. These results suggest that such models learned from simulated data can attain similar, if not better, performance in dialogues with real users.by Edward A. Filisko.Ph.D

    Toward Widely-Available and Usable Multimodal Conversational Interfaces

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-166).Multimodal conversational interfaces, which allow humans to interact with a computer using a combination of spoken natural language and a graphical interface, offer the potential to transform the manner by which humans communicate with computers. While researchers have developed myriad such interfaces, none have made the transition out of the laboratory and into the hands of a significant number of users. This thesis makes progress toward overcoming two intertwined barriers preventing more widespread adoption: availability and usability. Toward addressing the problem of availability, this thesis introduces a new platform for building multimodal interfaces that makes it easy to deploy them to users via the World Wide Web. One consequence of this work is City Browser, the first multimodal conversational interface made publicly available to anyone with a web browser and a microphone. City Browser serves as a proof-of-concept that significant amounts of usage data can be collected in this way, allowing a glimpse of how users interact with such interfaces outside of a laboratory environment. City Browser, in turn, has served as the primary platform for deploying and evaluating three new strategies aimed at improving usability. The most pressing usability challenge for conversational interfaces is their limited ability to accurately transcribe and understand spoken natural language. The three strategies developed in this thesis - context-sensitive language modeling, response confidence scoring, and user behavior shaping - each attack the problem from a different angle, but they are linked in that each critically integrates information from the conversational context.by Alexander Gruenstein.Ph.D

    Language technologies in speech-enabled second language learning games : from reading to dialogue

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 237-244).Second language learning has become an important societal need over the past decades. Given that the number of language teachers is far below demand, computer-aided language learning software is becoming a promising supplement to traditional classroom learning, as well as potentially enabling new opportunities for self-learning. The use of speech technologies is especially attractive to offer students unlimited chances for speaking exercises. To create helpful and intelligent speaking exercises on a computer, it is necessary for the computer to not only recognize the acoustics, but also to understand the meaning and give appropriate responses. Nevertheless, most existing speech-enabled language learning software focuses only on speech recognition and pronunciation training. Very few have emphasized exercising the student's composition and comprehension abilities and adopting language technologies to enable free-form conversation emulating a real human tutor. This thesis investigates the critical functionalities of a computer-aided language learning system, and presents a generic framework as well as various language- and domain-independent modules to enable building complex speech-based language learning systems. Four games have been designed and implemented using the framework and the modules to demonstrate their usability and flexibility, where dynamic content creation, automatic assessment, and automatic assistance are emphasized. The four games, reading, translation, question-answering and dialogue, offer different activities with gradually increasing difficulty, and involve a wide range of language processing techniques, such as language understanding, language generation, question generation, context resolution, dialogue management and user simulation. User studies with real subjects show that the systems were well received and judged to be helpful.by Yushi Xu.Ph.D

    Harnessing Evolution of Multi-Turn Conversations for Effective Answer Retrieval

    Get PDF
    With the improvements in speech recognition and voice generation technologies over the last years, a lot of companies have sought to develop conversation understanding systems that run on mobile phones or smart home devices through natural language interfaces. Conversational assistants, such as Google Assistant and Microsoft Cortana, can help users to complete various types of tasks. This requires an accurate understanding of the user's information need as the conversation evolves into multiple turns. Finding relevant context in a conversation's history is challenging because of the complexity of natural language and the evolution of a user's information need. In this work, we present an extensive analysis of language, relevance, dependency of user utterances in a multi-turn information-seeking conversation. To this aim, we have annotated relevant utterances in the conversations released by the TREC CaST 2019 track. The annotation labels determine which of the previous utterances in a conversation can be used to improve the current one. Furthermore, we propose a neural utterance relevance model based on BERT fine-tuning, outperforming competitive baselines. We study and compare the performance of multiple retrieval models, utilizing different strategies to incorporate the user's context. The experimental results on both classification and retrieval tasks show that our proposed approach can effectively identify and incorporate the conversation context. We show that processing the current utterance using the predicted relevant utterance leads to a 38% relative improvement in terms of nDCG@20. Finally, to foster research in this area, we have released the dataset of the annotations.Comment: To appear in ACM CHIIR 2020, Vancouver, BC, Canad

    The audio-graphical interface to a personal integrated telecommunications system

    Get PDF
    Thesis (M.S.V.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1984.Includes bibliographical references (leaves 80-88).The telephone is proposed as an environment for exploring conversational computer systems. A personal communications system is developed which supports multi-modal access to multi-media mail. It is a testbed for developing novel methods of interactive information retrieval that are as intuitive and useful as the spoken word. A personalized telecommunications management system that handles both voice and electronic mail mess.ages through a unified user interface is described. Incoming voice messages are gathered via a conversational answering machine. Known callers are identified with a speech recognition unit so they can receive personal outgoing recordings. The system's owner accesses messages over the telephone by voice using natural language queries, or with the telephone keypad. Electronic mail messages and system status are transmitted by a text-to-speech synthesizer. Local access is provided by a touch sensitive screen and color raster display. Text and digitized voice messages are randomly accessible through graphical ideograms. A Rolodex-style directory permits dialing-by-name and the creation of outgoing recordings for individuals or mailing lists. Note: A 3/4 inch color U-matic video cassette accompanies this thesis, it is five minutes in length, and has an English narrative.by Barry Michael Arons.M.S.V.S

    The use of belief networks in natural language understanding and dialog modeling.

    Get PDF
    Wai, Chi Man Carmen.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 129-136).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Natural Language Understanding --- p.3Chapter 1.3 --- BNs for Handling Speech Recognition Errors --- p.4Chapter 1.4 --- BNs for Dialog Modeling --- p.5Chapter 1.5 --- Thesis Goals --- p.8Chapter 1.6 --- Thesis Outline --- p.8Chapter 2 --- Background --- p.10Chapter 2.1 --- Natural Language Understanding --- p.11Chapter 2.1.1 --- Rule-based Approaches --- p.12Chapter 2.1.2 --- Stochastic Approaches --- p.13Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.16Chapter 2.2 --- Handling Recognition Errors in Spoken Queries --- p.17Chapter 2.3 --- Spoken Dialog Systems --- p.19Chapter 2.3.1 --- Finite-State Networks --- p.21Chapter 2.3.2 --- The Form-based Approaches --- p.21Chapter 2.3.3 --- Sequential Decision Approaches --- p.22Chapter 2.3.4 --- Machine Learning Approaches --- p.24Chapter 2.4 --- Belief Networks --- p.27Chapter 2.4.1 --- Introduction --- p.27Chapter 2.4.2 --- Bayesian Inference --- p.29Chapter 2.4.3 --- Applications of the Belief Networks --- p.32Chapter 2.5 --- Chapter Summary --- p.33Chapter 3 --- Belief Networks for Natural Language Understanding --- p.34Chapter 3.1 --- The ATIS Domain --- p.35Chapter 3.2 --- Problem Formulation --- p.36Chapter 3.3 --- Semantic Tagging --- p.37Chapter 3.4 --- Belief Networks Development --- p.38Chapter 3.4.1 --- Concept Selection --- p.39Chapter 3.4.2 --- Bayesian Inferencing --- p.40Chapter 3.4.3 --- Thresholding --- p.40Chapter 3.4.4 --- Goal Identification --- p.41Chapter 3.5 --- Experiments on Natural Language Understanding --- p.42Chapter 3.5.1 --- Comparison between Mutual Information and Informa- tion Gain --- p.42Chapter 3.5.2 --- Varying the Input Dimensionality --- p.44Chapter 3.5.3 --- Multiple Goals and Rejection --- p.46Chapter 3.5.4 --- Comparing Grammars --- p.47Chapter 3.6 --- Benchmark with Decision Trees --- p.48Chapter 3.7 --- Performance on Natural Language Understanding --- p.51Chapter 3.8 --- Handling Speech Recognition Errors in Spoken Queries --- p.52Chapter 3.8.1 --- Corpus Preparation --- p.53Chapter 3.8.2 --- Enhanced Belief Network Topology --- p.54Chapter 3.8.3 --- BNs for Handling Speech Recognition Errors --- p.55Chapter 3.8.4 --- Experiments on Handling Speech Recognition Errors --- p.60Chapter 3.8.5 --- Significance Testing --- p.64Chapter 3.8.6 --- Error Analysis --- p.65Chapter 3.9 --- Chapter Summary --- p.67Chapter 4 --- Belief Networks for Mixed-Initiative Dialog Modeling --- p.68Chapter 4.1 --- The CU FOREX Domain --- p.69Chapter 4.1.1 --- Domain-Specific Constraints --- p.69Chapter 4.1.2 --- Two Interaction Modalities --- p.70Chapter 4.2 --- The Belief Networks --- p.70Chapter 4.2.1 --- Informational Goal Inference --- p.72Chapter 4.2.2 --- Detection of Missing / Spurious Concepts --- p.74Chapter 4.3 --- Integrating Two Interaction Modalities --- p.78Chapter 4.4 --- Incorporating Out-of-Vocabulary Words --- p.80Chapter 4.4.1 --- Natural Language Queries --- p.80Chapter 4.4.2 --- Directed Queries --- p.82Chapter 4.5 --- Evaluation of the BN-based Dialog Model --- p.84Chapter 4.6 --- Chapter Summary --- p.87Chapter 5 --- Scalability and Portability of Belief Network-based Dialog Model --- p.88Chapter 5.1 --- Migration to the ATIS Domain --- p.89Chapter 5.2 --- Scalability of the BN-based Dialog Model --- p.90Chapter 5.2.1 --- Informational Goal Inference --- p.90Chapter 5.2.2 --- Detection of Missing / Spurious Concepts --- p.92Chapter 5.2.3 --- Context Inheritance --- p.94Chapter 5.3 --- Portability of the BN-based Dialog Model --- p.101Chapter 5.3.1 --- General Principles for Probability Assignment --- p.101Chapter 5.3.2 --- Performance of the BN-based Dialog Model with Hand- Assigned Probabilities --- p.105Chapter 5.3.3 --- Error Analysis --- p.108Chapter 5.4 --- Enhancements for Discourse Query Understanding --- p.110Chapter 5.4.1 --- Combining Trained and Handcrafted Probabilities --- p.110Chapter 5.4.2 --- Handcrafted Topology for BNs --- p.111Chapter 5.4.3 --- Performance of the Enhanced BN-based Dialog Model --- p.117Chapter 5.5 --- Chapter Summary --- p.120Chapter 6 --- Conclusions --- p.122Chapter 6.1 --- Summary --- p.122Chapter 6.2 --- Contributions --- p.126Chapter 6.3 --- Future Work --- p.127Bibliography --- p.129Chapter A --- The Two Original SQL Query --- p.137Chapter B --- "The Two Grammars, GH and GsA" --- p.139Chapter C --- Probability Propagation in Belief Networks --- p.149Chapter C.1 --- Computing the aposteriori probability of P*(G) based on in- put concepts --- p.151Chapter C.2 --- Computing the aposteriori probability of P*(Cj) by backward inference --- p.154Chapter D --- Total 23 Concepts for the Handcrafted BN --- p.15
    • …
    corecore