8,697 research outputs found

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

    Get PDF
    We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

    Comparison of statistical algorithms for detecting homogeneous river reaches along a longitudinal continuum

    Get PDF
    International audienceSeven methods designed to delineate homogeneous river segments, belonging to four families, namely -- tests of homogeneity, contrast enhancing, spatially constrained classification, and hidden Markov models -- are compared, firstly on their principles, then on a case study, and on theoretical templates. These templates contain patterns found in the case study but not considered in the standard assumptions of statistical methods, such as gradients and curvilinear structures. The influence of data resolution, noise and weak satisfaction of the assumptions underlying the methods is investigated. The control of the number of reaches obtained in order to achieve meaningful comparisons is discussed. No method is found that outperforms all the others on all trials. However, the methods with sequential algorithms (keeping at order n + 1 all breakpoints found at order n) fail more often than those running complete optimisation at any order. The Hubert-Kehagias method and Hidden Markov Models are the most successful at identifying subpatterns encapsulated within the templates. Ergodic Hidden Markov Models are, moreover, liable to exhibit transition areas
    corecore