Search CORE

355 research outputs found

Increase Apparent Public Speaking Fluency By Speech Augmentation

Author: Das Sagnik
Gandhi Nisha
Naik Tejas
Shilkrot Roy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/08/2019
Field of study

Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this paper, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve this task by manipulating the disfluencies in human speech, like the sounds 'uh' and 'um', the filler words and awkward long silences. Given any unrehearsed speech we segment and silence the filled pauses and doctor the duration of imposed silence as well as other long pauses ('disfluent') by a predictive model learned using professional speech dataset. Finally, we output a audio stream in which speaker sounds more fluent, confident and practiced compared to the original speech he/she recorded. According to our quantitative evaluation, we significantly increase the fluency of speech by reducing rate of pauses and fillers

arXiv.org e-Print Archive

Crossref

Comparing Different Methods for Disfluency Structure Detection

Author: Batista Fernando
Medeiros Henrique
Moniz Helena
Nunes Luis
Trancoso Isabel
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Symposium on Languages, Applications and Technologies
Publication date: 01/01/2013
Field of study

This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point

Repositório Institucional do ISCTE-IUL

Dagstuhl Research Online Publication Server

Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning

Author: Hough J
Rohanian M
The 28th International Conference on Computational Linguistics
Publication venue: The 28th International Conference on Computational Linguistics
Publication date: 01/01/2021
Field of study

Queen Mary Research Online