1,318 research outputs found

    Increase Apparent Public Speaking Fluency By Speech Augmentation

    Full text link
    Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this paper, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve this task by manipulating the disfluencies in human speech, like the sounds 'uh' and 'um', the filler words and awkward long silences. Given any unrehearsed speech we segment and silence the filled pauses and doctor the duration of imposed silence as well as other long pauses ('disfluent') by a predictive model learned using professional speech dataset. Finally, we output a audio stream in which speaker sounds more fluent, confident and practiced compared to the original speech he/she recorded. According to our quantitative evaluation, we significantly increase the fluency of speech by reducing rate of pauses and fillers

    The effect of informational load on disfluencies in interpreting: a corpus-based regression analysis

    Get PDF
    This article attempts to measure the cognitive or informational load in interpreting by modelling the occurrence rate of the speech disfluency uh(m). In a corpus of 107 interpreted and 240 non-interpreted texts, informational load is operationalized in terms of four measures: delivery rate, lexical density, percentage of numerals, and average sentence length. The occurrence rate of the indicated speech disfluency was modelled using a rate model. Interpreted texts are analyzed based on the interpreter's output and compared with the input of non-interpreted texts, and measure the effect of source text features. The results demonstrate that interpreters produce significantly more uh(m) s than non-interpreters and that this difference is mainly due to the effect of lexical density on the output side. The main source predictor of uh(m) s in the target text was shown to be the delivery rate of the source text. On a more general level of significance, the second analysis also revealed an increasing effect of the numerals in the source texts and a decreasing effect of the numerals in the target texts

    Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

    Full text link
    In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

    Are language production problems apparent in adults who no longer meet diagnostic criteria for attention-deficit/hyperactivity disorder?

    Get PDF
    In this study, we examined sentence production in a sample of adults (N = 21) who had had attention-deficit/hyperactivity disorder (ADHD) as children, but as adults no longer met DSM-IV diagnostic criteria (APA, 2000). This “remitted” group was assessed on a sentence production task. On each trial, participants saw two objects and a verb. Their task was to construct a sentence using the objects as arguments of the verb. Results showed more ungrammatical and disfluent utterances with one particular type of verb (i.e., participle). In a second set of analyses, we compared the remitted group to both control participants and a “persistent” group, who had ADHD as children and as adults. Results showed that remitters were more likely to produce ungrammatical utterances and to make repair disfluencies compared to controls, and they patterned more similarly to ADHD participants. Conclusions focus on language output in remitted ADHD, and the role of executive functions in language production

    The relation between pitch and gestures in a story-telling task

    Get PDF
    Anecdotal evidence suggests that both pitch range and gestures contribute to the perception of speakers\u2019 liveliness in speech. However, the relation between speakers\u2019 pitch range and gestures has received little attention. It is possible that variations in pitch range might be accompanied by variations in gestures, and vice versa. In second language speech, the relation between pitch range and gestures might also be affected by speakers\u2019 difficulty in speaking the L2. In this pilot study we compare global pitch range and gesture rate in the speech of 3 native Italian speakers, telling the same story once in Italian and twice in English as part of an in-class oral presentation task. The hypothesis tested is that contextual factors, such as speakers\u2019 nervousness with the task, cause speakers to use narrow pitch range and limited gestures; a greater ease with the task, due to its repetition, cause speakers to use a wider pitch range and more gestures. This experimental hypothesis is partially confirmed by the results of this study

    Alcohol Language Corpus

    Get PDF
    The Alcohol Language Corpus (ALC) is the first publicly available speech corpus comprising intoxicated and sober speech of 162 female and male German speakers. Recordings are done in the automotive environment to allow for the development of automatic alcohol detection and to ensure a consistent acoustic environment for the alcoholized and the sober recording. The recorded speech covers a variety of contents and speech styles. Breath and blood alcohol concentration measurements are provided for all speakers. A transcription according to SpeechDat/Verbmobil standards and disfluency tagging as well as an automatic phonetic segmentation are part of the corpus. An Emu version of ALC allows easy access to basic speech parameters as well as the us of R for statistical analysis of selected parts of ALC. ALC is available without restriction for scientific or commercial use at the Bavarian Archive for Speech Signals
    corecore