175 research outputs found
Promoting Increased Pitch Variation in Oral Presentations with Transient Visual Feedback
This paper investigates learner response to a novel kind of intonation feedback generated from speech analysis. Instead of displays of pitch curves, the feedback our system produces is flashing lights of different colors, which show how much pitch variation the speaker has produced rather than an absolute measure of frequency. The variable used to generate the feedback is the standard deviation of fundamental frequency (as measured in semitones) over the previous ten seconds of speech. Flat or monotone speech causes the system to show yellow lights, while more expressive speech that has used pitch to give focus to any part of an utterance generates green lights. The system is designed to be used with free, rather than modeled, speech. Participants in the study were 14 Chinese-native students of English at intermediate and advanced levels. A group that received feedback was compared with a group that received no feedback other than the ability to listen to recordings of their speech, with the hypothesis that the feedback would stimulate the development of a speaking style that used more pitch variation. Pitch variation was measured at four stages of our study: in a baseline oral presentation; for the first and second halves of roughly three hours of training; and finally in the production of a new oral presentation. Both groups increased their pitch variation with training, and the effect lasted after the training had ended. The test group showed a significantly higher increase than the control group, indicating that the feedback is effective. These positive results imply that the feedback could be beneficially used in a system for practicing oral presentations
Backchannel relevance spaces
This contribution introduces backchannel relevance spaces – intervals where it is relevant for a listener in a conversation to produce a backchannel. By annotating and comparing actual visual and vocal backchannels with potential backchannels established using a group of subjects acting as third-party listeners, we show (i) that visual only backchannels represent a substantial proportion of all backchannels; and (ii) that there are more opportunities for backchannels (i.e. potential backchannels or backchannel relevance spaces) than there are actual vocal and visual backchannels. These findings indicate that backchannel relevance spaces enable more accurate acoustic, prosodic, lexical (et cetera) descriptions of backchannel inviting cues than descriptions based on the context of actual vocal backchannels only
Evolution of the human tongue and emergence of speech biomechanics
The tongue is one of the organs most central to human speech. Here, the evolution and species-unique properties of the human tongue is traced, via reference to the apparent articulatory behavior of extant non-human great apes, and fossil findings from early hominids – from a point of view of articulatory phonetics, the science of human speech production. Increased lingual flexibility provided the possibility of mapping of articulatory targets, possibly via exaptation of manual-gestural mapping capacities evident in extant great apes. The emergence of the human-specific tongue, its properties, and morphology were crucial to the evolution of human articulate speech
Recommended from our members
Very Short Utterances in Conversation
Faced with the difficulties of finding an operationalized definition of backchannels, we have previously proposed an intermediate, auxiliary unit – the very short utterance (VSU) – which is defined operationally and is automatically extractable from recorded or ongoing dialogues. Here, we extend that work in the following ways: (1) we test the extent to which the VSU/NONVSU distinction corresponds to backchannels/non-backchannels in a different data set that is manually annotated for backchannels – the Columbia Games Corpus; (2) we examine to the extent to which VSUS capture other short utterances with a vocabulary similar to backchannels; (3) we propose a VSU method for better managing turn-taking and barge-ins in spoken dialogue systems based on detection of backchannels; and (4) we attempt to detect backchannels with better precision by training a backchannel classifier using durations and inter-speaker relative loudness differences as features. The results show that VSUS indeed capture a large proportion of backchannels – large enough that VSUs can be used to improve spoken dialogue system turntaking; and that building a reliable backchannel classifier working in real time is feasible
Towards Metadata Descriptions for Multimodal Corpora of Natural Communication Data
Freigang F, Bergmann K. Towards Metadata Descriptions for Multimodal Corpora of Natural Communication Data. In: Edlund J, Heylen D, Paggio P, eds. Proceedings of the Workshop on Multimodal Corpora 2013: Multimodal Corpora: Beyond Audio and Video. 2013.Metadata play an important role for successful corpus management and reusability of corpora. For linguistic resources there already exist a large amount of metadata descriptions and metadata schemes. However, not much work has been done to develop metadata for the particular structure of multimodal corpora, yet. In this paper we provide a review of existing metadata profiles for multimodal data. We discuss in how far these are adequate to describe multimodal resources and point out conclusions for future efforts
Swedish CLARIN activities
Proceedings of the NODALIDA 2009 workshop
Nordic Perspectives on the CLARIN Infrastructure of Language Resources.
Editors: Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard,
Eiríkur Rögnvaldsson and Koenraad de Smedt.
NEALT Proceedings Series, Vol. 5 (2009), 1-5.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9207
- …