Search CORE

60,230 research outputs found

Cracking the social code of speech prosody using reverse correlation

Author: Aucouturier Jean-Julien
Belin Pascal
Burred Juan José
Ponsot Emmanuel
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2018
Field of study

Human listeners excel at forming high-level social representations about each other, even from the briefest of utterances. In particular, pitch is widely recognized as the auditory dimension that conveys most of the information about a speaker's traits, emotional states, and attitudes. While past research has primarily looked at the influence of mean pitch, almost nothing is known about how intonation patterns, i.e., finely tuned pitch trajectories around the mean, may determine social judgments in speech. Here, we introduce an experimental paradigm that combines state-of-the-art voice transformation algorithms with psychophysical reverse correlation and show that two of the most important dimensions of social judgments, a speaker's perceived dominance and trustworthiness, are driven by robust and distinguishing pitch trajectories in short utterances like the word "Hello," which remained remarkably stable whether male or female listeners judged male or female speakers. These findings reveal a unique communicative adaptation that enables listeners to infer social traits regardless of speakers' physical characteristics, such as sex and mean pitch. By characterizing how any given individual's mental representations may differ from this generic code, the method introduced here opens avenues to explore dysprosody and social-cognitive deficits in disorders like autism spectrum and schizophrenia. In addition, once derived experimentally, these prototypes can be applied to novel utterances, thus providing a principled way to modulate personality impressions in arbitrary speech signals

Crossref

HAL AMU

Enlighten

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Author: Choi Yeunju
Jung Myunghun
Jung Youngmoon
Kim Hoirin
Kye Seong Min
Publication venue: 'International Speech Communication Association'
Publication date: 06/08/2020
Field of study

Currently, the most widely used approach for speaker verification is the deep speaker embedding learning. In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor. Multi-scale aggregation (MSA), which utilizes multi-scale features from different layers of the feature extractor, has recently been introduced and shows superior performance for variable-duration utterances. To increase the robustness dealing with utterances of arbitrary duration, this paper improves the MSA by using a feature pyramid module. The module enhances speaker-discriminative information of features from multiple layers via a top-down pathway and lateral connections. We extract speaker embeddings using the enhanced features that contain rich speaker information with different time scales. Experiments on the VoxCeleb dataset show that the proposed module improves previous MSA methods with a smaller number of parameters. It also achieves better performance than state-of-the-art approaches for both short and long utterances.Comment: Accepted to Interspeech 202

arXiv.org e-Print Archive

Crossref

Wavelet-based voice morphing

Author: Moroz I. M.
Orphanidou C.
Roberts S. J.
Publication venue
Publication date: 01/01/2004
Field of study

This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity

Oxford University Research Archive

A Phonetic Study on Phrasing in Seoul Korean

Author: Jeon Hae-Sung
Publication venue
Publication date: 01/01/2008
Field of study

CLoK

From Monologue to Dialogue: Natural Language Generation in OVIS

Author: Theune Mariët
Publication venue
Publication date: 01/01/2003
Field of study

This paper describes how a language generation system that was originally designed for monologue generation, has been adapted for use in the OVIS spoken dialogue system. To meet the requirement that in a dialogue, the system's utterances should make up a single, coherent dialogue turn, several modifications had to be made to the system. The paper also discusses the influence of dialogue context on information status, and its consequences for the generation of referring expressions and accentuation

CiteSeerX

University of Twente Research Information

Fostering reflection in the training of speech-receptive action

Author: Henninger Michael
Mandl Heinz
Publication venue
Publication date: 01/02/2003
Field of study

Dieser Aufsatz erörtert Möglichkeiten und Probleme der Förderung kommunikativer Fertigkeiten durch die Unterstützung der Reflexion eigenen sprachrezeptiven Handelns und des Einsatzes von computerunterstützten Lernumgebungen für dessen Förderung. Kommunikationstrainings widmen sich meistens der Förderung des beobachtbaren sprachproduktiven Handelns (Sprechen). Die individuellen kognitiven Prozesse, die dem sprachrezeptiven Handeln (Hören und Verstehen) zugrunde liegen, werden häufig vernachlässigt. Dies wird dadurch begründet, dass sprachrezeptives Handeln in einer kommunikativen Situation nur schwer zugänglich und die Förderung der individuellen Prozesse sprachrezeptiven Handelns sehr zeitaufwändig ist. Das zentrale Lernprinzip - die Reflexion des eigenen sprachlich-kommunikativen Handelns - wird aus verschiedenen Perspektiven diskutiert. Vor dem Hintergrund der Reflexionsmodelle wird die computerunterstützte Lernumgebung CaiMan© vorgestellt und beschrieben. Daran anschließend werden sieben Erfolgsfaktoren aus der empirischen Forschung zur Lernumgebung CaiMan© abgeleitet. Der Artikel endet mit der Vorstellung von zwei empirischen Studien, die Möglichkeiten der Reflexionsunterstützung untersucheThis article discusses the training of communicative skills by fostering the reflection of speech-receptive action and the opportunities for using software for this purpose. Most frameworks for the training of communicative behavior focus on fostering the observable speech-productive action (i.e. speaking); the individual cognitive processes underlying speech-receptive action (hearing and understanding utterances) are often neglected. Computer-supported learning environments employed as cognitive tools can help to foster speech-receptive action. Seven success factors for the integration of software into the training of soft skills have been derived from empirical research. The computer-supported learning environment CaiMan© based on these ideas is presented. One central learning principle in this learning environment reflection of one's own action will be discussed from different perspectives. The article concludes with two empirical studies examining opportunities to foster reflecti

Open Access LMU