2 research outputs found

    A framework for multi-modal input in a pervasive computing environment

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (leaves 51-53).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.In this thesis, we propose a framework that uses multiple-domains and multi-modal techniques to disambiguate a variety of natural human input modes. This system is based on the input needs of pervasive computing users. The work extends the Galaxy architecture developed by the Spoken Language Systems group at MIT. Just as speech recognition disambiguates an input wave form by using a grammar to find the best matching phrase, we use the same mechanism to disambiguate other input forms, T9 in particular. A skeleton version of the framework was implemented to show this framework is possible and to explore some of the issues that might arise. The system currently works for both T9 and Speech modes. The framework also includes potential for any other type of input for which a recognizer can be built such as graffiti input.by Shalini Agarwal.M.Eng

    Deixis and Conjunction in Multimodal Systems

    No full text
    In order to realize their full potential, multimodal interfaces need to support not just input from multiple modes, but single commands optimally distributed across the available input modes. A multimodal language processing architecture is needed to integrate semantic content from the different modes. Johnston 1998a proposes a modular approach to multilnodal language processing in which spoken language parsing is completed before multimodal parsiug. In this paper, I will demonstrate the difficulties this approach faces as the spoken language parsing component is expanded to provide a compositional analysis of deictic expressions. I propose an alternative architecture in which spoken and multimodal parsing are tightly interleaved. This architecture greatly simplifies the spoken language parsing grammar and enables predictive information fi'om spokeu language parsing to drive the application of multimodal parsing and gesture combination rules. I also propose a treatment of deictic numeral expressions that supports the broad range of pen gesture combinations that can be used to refer to collections of objects in the interface
    corecore