1,913 research outputs found

    Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

    Get PDF
    We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

    Phoneme Based Speaker Verification System Based on Two Stage Self-Organizing Map Design

    Get PDF
    Speaker verification is one of the pattern recognition task that authenticate a person by his or her voice. This thesis deals with a relatively new technique of classification that is the self-organizing map (SOM). Self-organizing map, as an unsupervised learning artificial neural network, rarely used as final classification step in pattern recognition task due to its relatively low accuracy. A two-stage self-organizing map design has been implemented in this thesis and showed improved results over conventional single stage design. For speech features extraction, this thesis does not introduce any new technique. A well study method that is the linear prediction analysis (LP A) has been used. Linear predictive analysis derived coefficients are extracted from segmented raw speech signal to train and test the front stage self-organizing map. Unlike other multistage or hierarchical self-organizing map designs, this thesis utilized residual vectors generated from front stage self-organizing map to train and test the second stage selforganizing map. The results showed that by breaking the classification tasks into two level or more detail resolution, an improvement of more than 5% can be obtained. Moreover, the computation time is also reduced greatly

    Dialogue Act Recognition Approaches

    Get PDF
    This paper deals with automatic dialogue act (DA) recognition. Dialogue acts are sentence-level units that represent states of a dialogue, such as questions, statements, hesitations, etc. The knowledge of dialogue act realizations in a discourse or dialogue is part of the speech understanding and dialogue analysis process. It is of great importance for many applications: dialogue systems, speech recognition, automatic machine translation, etc. The main goal of this paper is to study the existing works about DA recognition and to discuss their respective advantages and drawbacks. A major concern in the DA recognition domain is that, although a few DA annotation schemes seem now to emerge as standards, most of the time, these DA tag-sets have to be adapted to the specificities of a given application, which prevents the deployment of standardized DA databases and evaluation procedures. The focus of this review is put on the various kinds of information that can be used to recognize DAs, such as prosody, lexical, etc., and on the types of models proposed so far to capture this information. Combining these information sources tends to appear nowadays as a prerequisite to recognize DAs
    corecore