553 research outputs found

    Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

    Get PDF
    We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

    Intonation

    Full text link

    Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

    Get PDF
    We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

    Speech and Prosody Characteristics of Adolescents and Adults With High-Functioning Autism and Asperger Syndrome

    Get PDF
    Speech and prosody-voice profiles for 15 male speakers with High-Functioning Autism (HFA) and 15 male speakers with Asperger syndrome (AS) were compared to one another and to profiles for 53 typically developing male speakers in the same 10- to 50-years age range. Compared to the typically developing speakers, significantly more participants in both the HFA and AS groups had residual articulation distortion errors, uncodable utterances due to discourse constraints, and utterances coded as inappropriate in the domains of phrasing, stress, and resonance. Speakers with AS were significantly more voluble than speakers with HFA, but otherwise there were few statistically significant differences between the two groups of speakers with pervasive developmental disorders. Discussion focuses on perceptual-motor and social sources of differences in the prosody-voice findings for individuals with Pervasive Developmental Disorders as compared with findings for typical speakers, including comment on the grammatical, pragmatic, and affective aspects of prosody

    Multipoint genome-wide linkage scan for nonword repetition in a multigenerational family further supports chromosome 13q as a locus for verbal trait disorders

    Get PDF
    Verbal trait disorders encompass a wide range of conditions and are marked by deficits in five domains that impair a person’s ability to communicate: speech, language, reading, spelling, and writing. Nonword repetition is a robust endophenotype for verbal trait disorders that is sensitive to cognitive processes critical to verbal development, including auditory processing, phonological working memory, and motor planning and programming. In the present study, we present a six-generation extended pedigree with a history of verbal trait disorders. Using genome-wide multipoint variance component linkage analysis of nonword repetition, we identified a region spanning chromosome 13q14–q21 with LOD = 4.45 between 52 and 55 cM, spanning approximately 5.5 Mb on chromosome 13. This region overlaps with SLI3, a locus implicated in reading disability in families with a history of specific language impairment. Our study of a large multigenerational family with verbal trait disorders further implicates the SLI3 region in verbal trait disorders. Future studies will further refine the specific causal genetic factors in this locus on chromosome 13q that contribute to language traits

    Filled pauses in Hungarian: Their phonetic form and function

    Get PDF
    Filled pauses are natural occurrences in spontaneous speech and they may turn up at any level of the speech planning process and in a number of functions. The aim of this paper is to find out whether the diverse functions of filled pauses correlate with diverse articulations resulting in diverse acoustic structures. Spontaneous narratives are used as research material. The duration of the filled pauses and the frequency values of their first two formants are analyzed. The most frequent form, schwa, shows function-dependent realizations as confirmed by the durational values and by the second formant values of these vowel-like sounds
    corecore