4 research outputs found

    Toward Widely-Available and Usable Multimodal Conversational Interfaces

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-166).Multimodal conversational interfaces, which allow humans to interact with a computer using a combination of spoken natural language and a graphical interface, offer the potential to transform the manner by which humans communicate with computers. While researchers have developed myriad such interfaces, none have made the transition out of the laboratory and into the hands of a significant number of users. This thesis makes progress toward overcoming two intertwined barriers preventing more widespread adoption: availability and usability. Toward addressing the problem of availability, this thesis introduces a new platform for building multimodal interfaces that makes it easy to deploy them to users via the World Wide Web. One consequence of this work is City Browser, the first multimodal conversational interface made publicly available to anyone with a web browser and a microphone. City Browser serves as a proof-of-concept that significant amounts of usage data can be collected in this way, allowing a glimpse of how users interact with such interfaces outside of a laboratory environment. City Browser, in turn, has served as the primary platform for deploying and evaluating three new strategies aimed at improving usability. The most pressing usability challenge for conversational interfaces is their limited ability to accurately transcribe and understand spoken natural language. The three strategies developed in this thesis - context-sensitive language modeling, response confidence scoring, and user behavior shaping - each attack the problem from a different angle, but they are linked in that each critically integrates information from the conversational context.by Alexander Gruenstein.Ph.D

    Multimodal speech interfaces for map-based applications

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 71-73).This thesis presents the development of multimodal speech interfaces for mobile and vehicle systems. Multimodal interfaces have been shown to increase input efficiency in comparison with their purely speech or text-based counterparts. To date, much of the existing work has focused on desktop or large tablet-sized devices. The advent of the smartphone and its ability to handle both speech and touch inputs in combination with a screen display has created a compelling opportunity for deploying multimodal systems on smaller-sized devices. We introduce a multimodal user interface designed for mobile and vehicle devices, and system enhancements for a dynamically expandable point-of-interest database. The mobile system is evaluated using Amazon Mechanical Turk and the vehicle- based system is analyzed through in-lab usability studies. Our experiments show encouraging results for multimodal speech adoption.by Sean Liu.M.Eng

    Understanding user state and preferences for robust spoken dialog systems and location-aware assistive technology

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science; and, (S.M. in Technology and Policy)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 119-125).This research focuses on improving the performance of spoken dialog systems (SDS) in the domain of assistive technology for people with disabilities. Automatic speech recognition (ASR) has compelling potential applications as a means of enabling people with physical disabilities to enjoy greater levels of independence and participation. This thesis describes the development and evaluation of a spoken dialog system modeled as a partially observable Markov decision process (SDS-POMDP). The SDSPOMDP can understand commands related to making phone calls and providing information about weather, activities, and menus in a specialized-care residence setting. Labeled utterance data was used to train observation and utterance confidence models. With a user simulator, the SDS-POMDP reward function parameters were optimized, and the SDS-POMDP is shown to out-perform simpler threshold-based dialog strategies. These simulations were validated in experiments with human participants, with the SDS-POMDP resulting in more successful dialogs and faster dialog completion times, particularly for speakers with high word-error rates. This thesis also explores the social and ethical implications of deploying location based assistive technology in specialized-care settings. These technologies could have substantial potential benefit to residents and caregivers in such environments, but they may also raise issues related to user safety, independence, autonomy, or privacy. As one example, location-aware mobile devices are potentially useful to increase the safety of individuals in a specialized-care setting who may be at risk of unknowingly wandering, but they raise important questions about privacy and informed consent. This thesis provides a survey of U.S. legislation related to the participation of individuals who have questionable capacity to provide informed consent in research studies. Overall, it seeks to precisely describe and define the key issues that are arise as a result of new, unforeseen technologies that may have both benefits and costs to the elderly and people with disabilities.by William Li.S.M.in Technology and PolicyS.M

    Language technologies in speech-enabled second language learning games : from reading to dialogue

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 237-244).Second language learning has become an important societal need over the past decades. Given that the number of language teachers is far below demand, computer-aided language learning software is becoming a promising supplement to traditional classroom learning, as well as potentially enabling new opportunities for self-learning. The use of speech technologies is especially attractive to offer students unlimited chances for speaking exercises. To create helpful and intelligent speaking exercises on a computer, it is necessary for the computer to not only recognize the acoustics, but also to understand the meaning and give appropriate responses. Nevertheless, most existing speech-enabled language learning software focuses only on speech recognition and pronunciation training. Very few have emphasized exercising the student's composition and comprehension abilities and adopting language technologies to enable free-form conversation emulating a real human tutor. This thesis investigates the critical functionalities of a computer-aided language learning system, and presents a generic framework as well as various language- and domain-independent modules to enable building complex speech-based language learning systems. Four games have been designed and implemented using the framework and the modules to demonstrate their usability and flexibility, where dynamic content creation, automatic assessment, and automatic assistance are emphasized. The four games, reading, translation, question-answering and dialogue, offer different activities with gradually increasing difficulty, and involve a wide range of language processing techniques, such as language understanding, language generation, question generation, context resolution, dialogue management and user simulation. User studies with real subjects show that the systems were well received and judged to be helpful.by Yushi Xu.Ph.D
    corecore