Search CORE

1,318 research outputs found

Increase Apparent Public Speaking Fluency By Speech Augmentation

Author: Das Sagnik
Gandhi Nisha
Naik Tejas
Shilkrot Roy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/08/2019
Field of study

Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this paper, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve this task by manipulating the disfluencies in human speech, like the sounds 'uh' and 'um', the filler words and awkward long silences. Given any unrehearsed speech we segment and silence the filled pauses and doctor the duration of imposed silence as well as other long pauses ('disfluent') by a predictive model learned using professional speech dataset. Finally, we output a audio stream in which speaker sounds more fluent, confident and practiced compared to the original speech he/she recorded. According to our quantitative evaluation, we significantly increase the fluency of speech by reducing rate of pauses and fillers

arXiv.org e-Print Archive

Crossref

The effect of informational load on disfluencies in interpreting: a corpus-based regression analysis

Author: Defrancq Bart
Plevoets Koen
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2016
Field of study

This article attempts to measure the cognitive or informational load in interpreting by modelling the occurrence rate of the speech disfluency uh(m). In a corpus of 107 interpreted and 240 non-interpreted texts, informational load is operationalized in terms of four measures: delivery rate, lexical density, percentage of numerals, and average sentence length. The occurrence rate of the indicated speech disfluency was modelled using a rate model. Interpreted texts are analyzed based on the interpreter's output and compared with the input of non-interpreted texts, and measure the effect of source text features. The results demonstrate that interpreters produce significantly more uh(m) s than non-interpreters and that this difference is mainly due to the effect of lexical density on the output side. The main source predictor of uh(m) s in the target text was shown to be the delivery rate of the source text. On a more general level of significance, the second analysis also revealed an increasing effect of the numerals in the source texts and a decreasing effect of the numerals in the target texts

Ghent University Academic Bibliography

Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates

Author: Goldwater Sharon
Jurafsky Dan
Manning Christopher D.
Publication venue
Publication date: 01/06/2008
Field of study

Edinburgh Research Explorer

Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Author: Bansal Mohit
Gimpel Kevin
Livescu Karen
Ostendorf Mari
Toshniwal Shubham
Tran Trang
Publication venue
Publication date: 01/01/2018
Field of study

In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

arXiv.org e-Print Archive

Crossref

Are language production problems apparent in adults who no longer meet diagnostic criteria for attention-deficit/hyperactivity disorder?

Author: Achenbach T.
American Psychiatric Association
American Psychiatric Association
Barkley R. A.
Berg T.
Bock J. K.
Brown T. E.
Conners C. K.
Cooper P. V.
de Smedt K.
Fernanda Ferreira
Indefrey P.
Joel T. Nigg
Kempen G.
Kempen G.
Levelt W. J. M.
Logan G. D.
Oram J.
Paul E. Engelhardt
Postma A.
Puig-Antich J.
Schmitter-Edgecombe M.
Sean N. Veld
Shao Z.
Tannock R.
Wechsler D.
Weiss M.
Wender P. H.
Wilkinson G. S.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2012
Field of study

In this study, we examined sentence production in a sample of adults (N = 21) who had had attention-deficit/hyperactivity disorder (ADHD) as children, but as adults no longer met DSM-IV diagnostic criteria (APA, 2000). This “remitted” group was assessed on a sentence production task. On each trial, participants saw two objects and a verb. Their task was to construct a sentence using the objects as arguments of the verb. Results showed more ungrammatical and disfluent utterances with one particular type of verb (i.e., participle). In a second set of analyses, we compared the remitted group to both control participants and a “persistent” group, who had ADHD as children and as adults. Results showed that remitters were more likely to produce ungrammatical utterances and to make repair disfluencies compared to controls, and they patterned more similarly to ADHD participants. Conclusions focus on language output in remitted ADHD, and the role of executive functions in language production

Northumbria University Research Portal

Crossref

PubMed Central

University of East Anglia digital repository

The relation between pitch and gestures in a story-telling task

Author: Brugnerotto S.
Busa' MARIA GRAZIA
Publication venue: University of Nantes, France
Publication date: 01/01/2015
Field of study

Anecdotal evidence suggests that both pitch range and gestures contribute to the perception of speakers\u2019 liveliness in speech. However, the relation between speakers\u2019 pitch range and gestures has received little attention. It is possible that variations in pitch range might be accompanied by variations in gestures, and vice versa. In second language speech, the relation between pitch range and gestures might also be affected by speakers\u2019 difficulty in speaking the L2. In this pilot study we compare global pitch range and gesture rate in the speech of 3 native Italian speakers, telling the same story once in Italian and twice in English as part of an in-class oral presentation task. The hypothesis tested is that contextual factors, such as speakers\u2019 nervousness with the task, cause speakers to use narrow pitch range and limited gestures; a greater ease with the task, due to its repetition, cause speakers to use a wider pitch range and more gestures. This experimental hypothesis is partially confirmed by the results of this study

Archivio istituzionale della ricerca - Università di Padova

Alcohol Language Corpus

Author: Barfüßer Sabine
Heinrich Christian
Schiel Florian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The Alcohol Language Corpus (ALC) is the first publicly available speech corpus comprising intoxicated and sober speech of 162 female and male German speakers. Recordings are done in the automotive environment to allow for the development of automatic alcohol detection and to ensure a consistent acoustic environment for the alcoholized and the sober recording. The recorded speech covers a variety of contents and speech styles. Breath and blood alcohol concentration measurements are provided for all speakers. A transcription according to SpeechDat/Verbmobil standards and disfluency tagging as well as an automatic phonetic segmentation are part of the corpus. An Emu version of ALC allows easy access to basic speech parameters as well as the us of R for statistical analysis of selected parts of ALC. ALC is available without restriction for scientific or commercial use at the Bavarian Archive for Speech Signals

CiteSeerX

Crossref

Open Access LMU