60,230 research outputs found
Cracking the social code of speech prosody using reverse correlation
Human listeners excel at forming high-level social representations about each other, even from the briefest of utterances. In particular, pitch is widely recognized as the auditory dimension that conveys most of the information about a speaker's traits, emotional states, and attitudes. While past research has primarily looked at the influence of mean pitch, almost nothing is known about how intonation patterns, i.e., finely tuned pitch trajectories around the mean, may determine social judgments in speech. Here, we introduce an experimental paradigm that combines state-of-the-art voice transformation algorithms with psychophysical reverse correlation and show that two of the most important dimensions of social judgments, a speaker's perceived dominance and trustworthiness, are driven by robust and distinguishing pitch trajectories in short utterances like the word "Hello," which remained remarkably stable whether male or female listeners judged male or female speakers. These findings reveal a unique communicative adaptation that enables listeners to infer social traits regardless of speakers' physical characteristics, such as sex and mean pitch. By characterizing how any given individual's mental representations may differ from this generic code, the method introduced here opens avenues to explore dysprosody and social-cognitive deficits in disorders like autism spectrum and schizophrenia. In addition, once derived experimentally, these prototypes can be applied to novel utterances, thus providing a principled way to modulate personality impressions in arbitrary speech signals
Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances
Currently, the most widely used approach for speaker verification is the deep
speaker embedding learning. In this approach, we obtain a speaker embedding
vector by pooling single-scale features that are extracted from the last layer
of a speaker feature extractor. Multi-scale aggregation (MSA), which utilizes
multi-scale features from different layers of the feature extractor, has
recently been introduced and shows superior performance for variable-duration
utterances. To increase the robustness dealing with utterances of arbitrary
duration, this paper improves the MSA by using a feature pyramid module. The
module enhances speaker-discriminative information of features from multiple
layers via a top-down pathway and lateral connections. We extract speaker
embeddings using the enhanced features that contain rich speaker information
with different time scales. Experiments on the VoxCeleb dataset show that the
proposed module improves previous MSA methods with a smaller number of
parameters. It also achieves better performance than state-of-the-art
approaches for both short and long utterances.Comment: Accepted to Interspeech 202
Wavelet-based voice morphing
This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity
From Monologue to Dialogue: Natural Language Generation in OVIS
This paper describes how a language generation system that was originally designed for monologue generation, has been adapted for use in the OVIS spoken dialogue system. To meet the requirement that in a dialogue, the system's utterances should make up a single, coherent dialogue turn, several modifications had to be made to the system. The paper also discusses the influence of dialogue context on information status, and its consequences for the generation of referring expressions and accentuation
Fostering reflection in the training of speech-receptive action
Dieser Aufsatz erörtert Möglichkeiten und Probleme der Förderung kommunikativer Fertigkeiten durch die Unterstützung der Reflexion eigenen sprachrezeptiven Handelns und des Einsatzes von computerunterstützten Lernumgebungen für dessen Förderung. Kommunikationstrainings widmen sich meistens der Förderung des beobachtbaren sprachproduktiven Handelns (Sprechen). Die individuellen kognitiven Prozesse, die dem sprachrezeptiven Handeln (Hören und Verstehen) zugrunde liegen, werden häufig vernachlässigt. Dies wird dadurch begründet, dass sprachrezeptives Handeln in einer kommunikativen Situation nur schwer zugänglich und die Förderung der individuellen Prozesse sprachrezeptiven Handelns sehr zeitaufwändig ist. Das zentrale Lernprinzip - die Reflexion des eigenen sprachlich-kommunikativen Handelns - wird aus verschiedenen Perspektiven diskutiert. Vor dem Hintergrund der Reflexionsmodelle wird die computerunterstützte Lernumgebung CaiMan© vorgestellt und beschrieben. Daran anschließend werden sieben Erfolgsfaktoren aus der empirischen Forschung zur Lernumgebung CaiMan© abgeleitet. Der Artikel endet mit der Vorstellung von zwei empirischen Studien, die Möglichkeiten der Reflexionsunterstützung untersucheThis article discusses the training of communicative skills by fostering the reflection of speech-receptive action and the opportunities for using software for this purpose. Most frameworks for the training of communicative behavior focus on fostering the observable speech-productive action (i.e. speaking); the individual cognitive processes underlying speech-receptive action (hearing and understanding utterances) are often neglected. Computer-supported learning environments employed as cognitive tools can help to foster speech-receptive action. Seven success factors for the integration of software into the training of soft skills have been derived from empirical research. The computer-supported learning environment CaiMan© based on these ideas is presented. One central learning principle in this learning environment reflection of one's own action will be discussed from different perspectives. The article concludes with two empirical studies examining opportunities to foster reflecti
- …