Search CORE

4 research outputs found

Eighty Challenges Facing Speech Input/Output Technologies

Author: Victor Zue
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT During the past three decades, we have witnessed remarkable progress in the development of speech input/output technologies. Despite these successes, we are far from reaching human capabilities of recognizing nearly perfectly the speech spoken by many speakers, under varying acoustic environments, with essentially unrestricted vocabulary. Synthetic speech still sounds stilted and robot-like, lacking in real personality and emotion. There are many challenges that will remain unmet unless we can advance our fundamental understanding of human communication -how speech is produced and perceived, utilizing our innate linguistic competence. This paper outlines some of these challenges, ranging from signal presentation and lexical access to language understanding and multimodal integration, and speculates on how these challenges could be met

CiteSeerX

A framework for multi-modal input in a pervasive computing environment

Author: Agarwal Shalini, 1979-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2002
Field of study

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (leaves 51-53).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.In this thesis, we propose a framework that uses multiple-domains and multi-modal techniques to disambiguate a variety of natural human input modes. This system is based on the input needs of pervasive computing users. The work extends the Galaxy architecture developed by the Spoken Language Systems group at MIT. Just as speech recognition disambiguates an input wave form by using a grammar to find the best matching phrase, we use the same mechanism to disambiguate other input forms, T9 in particular. A skeleton version of the framework was implemented to show this framework is possible and to explore some of the issues that might arise. The system currently works for both T9 and Speech modes. The framework also includes potential for any other type of input for which a recognizer can be built such as graffiti input.by Shalini Agarwal.M.Eng

DSpace@MIT