553 research outputs found
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for
the automatic segmentation of speech into topically coherent units. We propose
two methods for combining lexical and prosodic information using hidden Markov
models and decision trees. Lexical information is obtained from a speech
recognizer, and prosodic features are extracted automatically from speech
waveforms. We evaluate our approach on the Broadcast News corpus, using the
DARPA-TDT evaluation metrics. Results show that the prosodic model alone is
competitive with word-based segmentation methods. Furthermore, we achieve a
significant reduction in error by combining the prosodic and word-based
knowledge sources.Comment: 27 pages, 8 figure
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Speech and Prosody Characteristics of Adolescents and Adults With High-Functioning Autism and Asperger Syndrome
Speech and prosody-voice profiles for 15 male speakers with High-Functioning Autism (HFA) and 15 male speakers with Asperger syndrome (AS) were compared to one another and to profiles for 53 typically developing male speakers in the same 10- to 50-years age range. Compared to the typically developing speakers, significantly more participants in both the HFA and AS groups had residual articulation distortion errors, uncodable utterances due to discourse constraints, and utterances coded as inappropriate in the domains of phrasing, stress, and resonance. Speakers with AS were significantly more voluble than speakers with HFA, but otherwise there were few statistically significant differences between the two groups of speakers with pervasive developmental disorders. Discussion focuses on perceptual-motor and social sources of differences in the prosody-voice findings for individuals with Pervasive Developmental Disorders as compared with findings for typical speakers, including comment on the grammatical, pragmatic, and affective aspects of prosody
Recommended from our members
Prosodic modulation in the babble of cochlear implanted and normally hearing infants: a perceptual study using a visual analogue scale
This study investigates prosodic modulation in the spontaneous canonical babble of congenitally deaf infants with cochlear implants (CI) and normally hearing (NH) infants. Research has shown that the acoustic cues to prominence are less modulated in CI babble. However acoustic measurements of individual cues to prominence give incomplete information about prosodic modulation. In the present study, raters are asked to judge prominence since they simultaneously take into account all prosodic cues. Disyllabic utterances produced by CI and NH infants were presented to naive adult raters who had to indicate the degree and direction of prosodic modulation between syllables on a visual analogue scale. The results show that the babble of infants with CI is rated as having less prosodic modulation. Moreover, segmentally more variegated babble is rated as having more prosodic modulation. Raters do not perceive the babble to be predominantly trochaic, which indicates that the predominant stress pattern of Dutch is not yet apparent in the children’s productions
Multipoint genome-wide linkage scan for nonword repetition in a multigenerational family further supports chromosome 13q as a locus for verbal trait disorders
Verbal trait disorders encompass a wide range of conditions and are marked by deficits in five domains that impair a person’s ability to communicate: speech, language, reading, spelling, and writing. Nonword repetition is a robust endophenotype for verbal trait disorders that is sensitive to cognitive processes critical to verbal development, including auditory processing, phonological working memory, and motor planning and programming. In the present study, we present a six-generation extended pedigree with a history of verbal trait disorders. Using genome-wide multipoint variance component linkage analysis of nonword repetition, we identified a region spanning chromosome 13q14–q21 with LOD = 4.45 between 52 and 55 cM, spanning approximately 5.5 Mb on chromosome 13. This region overlaps with SLI3, a locus implicated in reading disability in families with a history of specific language impairment. Our study of a large multigenerational family with verbal trait disorders further implicates the SLI3 region in verbal trait disorders. Future studies will further refine the specific causal genetic factors in this locus on chromosome 13q that contribute to language traits
Filled pauses in Hungarian: Their phonetic form and function
Filled pauses are natural occurrences in spontaneous speech and they may turn up at any level of the speech planning process and in a number of functions. The aim of this paper is to find out whether the diverse functions of filled pauses correlate with diverse articulations resulting in diverse acoustic structures. Spontaneous narratives are used as research material. The duration of the filled pauses and the frequency values of their first two formants are analyzed. The most frequent form, schwa, shows function-dependent realizations as confirmed by the durational values and by the second formant values of these vowel-like sounds
- …