82 research outputs found

    System Response Delay and User Strategy Selection

    No full text
    Computer Science Departmen

    Speech Interface Guidelines

    No full text
    Computer Science Departmen

    Goal-directed Speech in a Spoken Language System

    No full text
    Computer Science Departmen

    Speech Interface Guidelines

    No full text
    This document provides an overview of speech interface design principles as applied to the range of applications that have been developed at Carnegie Mellon. For the most part these are workstation-based applications based on spoken language understanding technology. Nevertheless the guidelines should be applicable to a wider range of applications. </p

    Language Modeling with Limited Domain Data

    No full text
    Generic recognition systems contain language models which arerepresentative of a broad corpus. In actual practice, however, recognitionis usually on a coherent text covering a single topic, suggestingthat knowledge of the topic at hand can be used to advantage. A basemodel can be augmented with information from a small sample ofdomain-specific language data to significantly improve recognitionperformance. Good performance may be obtained by merging inonly those n-grams that include words that are out of vocabularywith respect to the base model.</p

    A Comparison of Speech vs Typed Input

    No full text
    We conducted a series of empirical experiments in which users were asked to enter digit strings into the computer by voice or keyboard. Two different ways of verifying and correcting the spoken input were examined. Extensive timing analyses were performed to determine which aspects of the interface were critical to speedy completion of the task. The results show that speech is preferable for strings that require more than a few keystrokes. The results emphasize the need for fast and accurate speech recognition, but also demonstrate how error correction and input validation are crucial for an effective speech interface.</p

    Finding Recurrent Out-of-Vocabulary Words

    No full text
    <p>Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of-speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.</p

    Improving the Performance of an LVCSR System Through Ensembles of Acoustic Models

    No full text
    This paper describes our work on applying ensembles of acoustic models to the problem of large vocabulary continuous speech recognition (LVCSR). We propose three algorithms for constructing ensembles. The first two have their roots in bagging algorithms; however, instead of randomly sampling examples our algorithms construct training sets based on the word error rate. The third one is a boosting style algorithm. Different from other boosting methods which demand large resources for computation and storage, our method present a more efficient solution suitable for acoustic model training. We also investigate a method that seeks optimal combination for models. We report experimental results on a large real world corpus collected from the Carnegie Mellon Communicator dialog system. Significant improvements on system performance are observed in that up to 15.56% relative reduction on word error rate is achieved. </p

    Investigations of Issues for Using Multiple Acoustic Models to Improve Continuous Speech Recognition

    No full text
    This paper investigates two important issues in constructing and combining ensembles of acoustic models for reducing recognition errors. First, we investigate the applicability of the  AnyBoost algorithm for acoustic model training. AnyBoost is a generalized Boosting method that allows the use of an arbitrary loss function as the training criterion to construct ensemble of classifiers. We choose the MCE discriminative objective  function for our experiments. Initial test results on a real-world meeting recognition corpus show that AnyBoost is a competitive alternate to the standard AdaBoost algorithm. Second, we  investigate ROVER-based combination, focusing on the technique for selecting correct hypothesized words from aligned WTN. We propose a neural network based insertion detection and word scoring scheme for this. Our approach consistently outperforms the current voting technique used by ROVER in the experiments. </p

    Acquiring Domain-Specific Dialog Information from Task-Oriented Human-Human Interaction through an Unsupervised Learning

    No full text
    We  describe  an approach for acquiring the domain-specific dialog knowledge required to configure  a  task-oriented  dialog system  that uses human-human interaction data. The key aspects of this problem are the design of a dialog information representation and a learning approach  that supports capture of  domain information from in-domain  dialogs. To represent a dialog for a learning purpose,  we based our representation, the  form-based dialog structure representation, on an observable structure. We show that this representation is sufficient for modeling phenomena that occur regularly in  several dissimilar  taskoriented  domains, including informationaccess and  problem-solving. With the goal of ultimately  reducing human  annotation  effort, we examine the use of unsupervised learning techniques in acquiring the components of the form-based representation (i.e. task,  subtask, and concept). These techniques include statistical word clustering based on mutual information and  Kullback-Liebler distance, TextTiling, HMM-based segmentation, and bisecting  K-mean document clustering.  Withsome modifications to make these algorithms more suitable for inferring  the structure of a spoken dialog, the unsupervised learning algorithms show promise.</p
    corecore