4 research outputs found

    Comparing different machine learning approaches for disfluency structure detection in a corpus of university lectures

    Get PDF
    This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point.info:eu-repo/semantics/publishedVersio

    Identification and modeling of word fragments in spontaneous speech.

    Get PDF
    ABSTRACT This paper presents a novel approach to handling disfluencies, word fragments and self-interruption points in Cantonese conversational speech. We train a classifier that exploits lexical and acoustic information to automatically identify disfluencies during training of a speech recognition system on conversational speech, and then use this classifier to augment reference annotations used for acoustic model training. We experiment with approaches to modeling disfluencies in the pronunciation dictionary, and their effect on the polyphonic decision tree clustering. We achieve automatic detection of disfluencies with 88% accuracy, which leads to a reduction in character error rate of 1.9% absolute. While the high baseline error rates are due to the task we are currently working on, we demonstrate that this approach works well on the Switchboard corpus, for which the conversational nature of speech is also a major problem

    Detection of Word Fragments in Mandarin Telephone Conversation

    No full text
    Abstract We describe preliminary work on the detection of word fragments in Mandarin conversational telephone speech. We extracted prosodic, voice quality, and lexical features, and trained Decision Tree and SVM classifiers. Previous research shows that glottalization features are instrumental in English fragment detection. However, we show that Mandarin fragments are quite different than English; 90% of Mandarin fragments are followed immediately by a repetition of the fragmentary word. These repetition fragments are not glottalized, and they have a very specific distribution; the 12 most frequent words ("you", "I", "that", "have", "then", etc.) cover 50% of the tokens of these fragments. Thus rather than glottalization, we found the most useful feature for Mandarin fragment detection was the identity of the neighboring character (word or morpheme). In an oracle experiment using the true (reference) neighboring words as well as prosodic and voice quality features, we achieved 80% accuracy in Mandarin fragment detection
    corecore