7,655 research outputs found
Norm-based coding of voice identity in human auditory cortex
Listeners exploit small interindividual variations around a generic acoustical structure to discriminate and identify individuals from their voiceâa key requirement for social interactions. The human brain contains temporal voice areas (TVA) [1] involved in an acoustic-based representation of voice identity [2, 3, 4, 5 and 6], but the underlying coding mechanisms remain unknown. Indirect evidence suggests that identity representation in these areas could rely on a norm-based coding mechanism [4, 7, 8, 9, 10 and 11]. Here, we show by using fMRI that voice identity is coded in the TVA as a function of acoustical distance to two internal voice prototypes (one male, one female)âapproximated here by averaging a large number of same-gender voices by using morphing [12]. Voices more distant from their prototype are perceived as more distinctive and elicit greater neuronal activity in voice-sensitive cortex than closer voicesâa phenomenon not merely explained by neuronal adaptation [13 and 14]. Moreover, explicit manipulations of distance-to-mean by morphing voices toward (or away from) their prototype elicit reduced (or enhanced) neuronal activity. These results indicate that voice-sensitive cortex integrates relevant acoustical features into a complex representation referenced to idealized male and female voice prototypes. More generally, they shed light on remarkable similarities in cerebral representations of facial and vocal identity
Phonetic content influences voice discriminability
We present results from an experiment which shows that voice perception is influenced by the phonetic content of speech. Dutch listeners were presented with thirteen speakers pronouncing CVC words with systematically varying segmental content, and they had to discriminate the speakersâ voices. Results show that certain segments help listeners discriminate voices more than other segments do. Voice information can be extracted from every segmental position of a monosyllabic word and is processed rapidly. We also show that although relative discriminability within a closed set of voices appears to be a stable property of a voice, it is also influenced by segmental cues â that is, perceived uniqueness of a voice depends on what that voice says
How do you say âhelloâ? Personality impressions from brief novel voices
On hearing a novel voice, listeners readily form personality impressions of that speaker. Accurate or not, these impressions are known to affect subsequent interactions; yet the underlying psychological and acoustical bases remain poorly understood. Furthermore, hitherto studies have focussed on extended speech as opposed to analysing the instantaneous impressions we obtain from first experience. In this paper, through a mass online rating experiment, 320 participants rated 64 sub-second vocal utterances of the word âhelloâ on one of 10 personality traits. We show that: (1) personality judgements of brief utterances from unfamiliar speakers are consistent across listeners; (2) a two-dimensional âsocial voice spaceâ with axes mapping Valence (Trust, Likeability) and Dominance, each driven by differing combinations of vocal acoustics, adequately summarises ratings in both male and female voices; and (3) a positive combination of Valence and Dominance results in increased perceived male vocal Attractiveness, whereas perceived female vocal Attractiveness is largely controlled by increasing Valence. Results are discussed in relation to the rapid evaluation of personality and, in turn, the intent of others, as being driven by survival mechanisms via approach or avoidance behaviours. These findings provide empirical bases for predicting personality impressions from acoustical analyses of short utterances and for generating desired personality impressions in artificial voices
Cracking the social code of speech prosody using reverse correlation
Human listeners excel at forming high-level social representations about each other, even from the briefest of utterances. In particular, pitch is widely recognized as the auditory dimension that conveys most of the information about a speaker's traits, emotional states, and attitudes. While past research has primarily looked at the influence of mean pitch, almost nothing is known about how intonation patterns, i.e., finely tuned pitch trajectories around the mean, may determine social judgments in speech. Here, we introduce an experimental paradigm that combines state-of-the-art voice transformation algorithms with psychophysical reverse correlation and show that two of the most important dimensions of social judgments, a speaker's perceived dominance and trustworthiness, are driven by robust and distinguishing pitch trajectories in short utterances like the word "Hello," which remained remarkably stable whether male or female listeners judged male or female speakers. These findings reveal a unique communicative adaptation that enables listeners to infer social traits regardless of speakers' physical characteristics, such as sex and mean pitch. By characterizing how any given individual's mental representations may differ from this generic code, the method introduced here opens avenues to explore dysprosody and social-cognitive deficits in disorders like autism spectrum and schizophrenia. In addition, once derived experimentally, these prototypes can be applied to novel utterances, thus providing a principled way to modulate personality impressions in arbitrary speech signals
Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
How Do I Address You? Modelling addressing behavior based on an analysis of a multi-modal corpora of conversational discourse
Addressing is a special kind of referring and thus principles of multi-modal referring expression generation will also be basic for generation of address terms and addressing gestures for conversational agents. Addressing is a special kind of referring because of the different (second person instead of object) role that the referent has in the interaction. Based on an analysis of addressing behaviour in multi-party face-to-face conversations (meetings, TV discussions as well as theater plays), we present outlines of a model for generating multi-modal verbal and non-verbal addressing behaviour for agents in multi-party interactions
The sound of trustworthiness: acoustic-based modulation of perceived voice personality
When we hear a new voice we automatically form a "first impression" of the voice ownerâs
personality; a single word is sufficient to yield ratings highly consistent across listeners. Past
studies have shown correlations between personality ratings and acoustical parameters of
voice, suggesting a potential acoustical basis for voice personality impressions, but its
nature and extent remain unclear. Here we used data-driven voice computational modelling
to investigate the link between acoustics and perceived trustworthiness in the single word
"hello". Two prototypical voice stimuli were generated based on the acoustical features of
voices rated low or high in perceived trustworthiness, respectively, as well as a continuum
of stimuli inter- and extrapolated between these two prototypes. Five hundred listeners
provided trustworthiness ratings on the stimuli via an online interface. We observed an
extremely tight relationship between trustworthiness ratings and position along the trustworthiness
continuum (r = 0.99). Not only were trustworthiness ratings higher for the high- than
the low-prototypes, but the difference could be modulated quasi-linearly by reducing or
exaggerating the acoustical difference between the prototypes, resulting in a strong
caricaturing effect. The f0 trajectory, or intonation, appeared a parameter of particular relevance:
hellos rated high in trustworthiness were characterized by a high starting f0 then a
marked decrease at mid-utterance to finish on a strong rise. These results demonstrate a
strong acoustical basis for voice personality impressions, opening the door to multiple
potential applications
The evolution of auditory contrast
This paper reconciles the standpoint that language users do not aim at improving their sound systems with the observation that languages seem to improve their sound systems. Computer simulations of inventories of sibilants show that Optimality-Theoretic learners who optimize their perception grammars automatically introduce a so-called prototype effect, i.e. the phenomenon that the learnerâs preferred auditory realization of a certain phonological category is more peripheral than the average auditory realization of this category in her language environment. In production, however, this prototype effect is counteracted by an articulatory effect that limits the auditory form to something that is not too difficult to pronounce. If the prototype effect and the articulatory effect are of a different size, the learner must end up with an auditorily different sound system from that of her language environment. The computer simulations show that, independently of the initial auditory sound system, a stable equilibrium is reached within a small number of generations. In this stable state, the dispersion of the sibilants of the language strikes an optimal balance between articulatory ease and auditory contrast. The important point is that this is derived within a model without any goal-oriented elements such as dispersion constraints
Recommended from our members
How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices
Our voices sound different depending on the context (laughing vs. talking to a child vs. giving a speech), making withinâperson variability an inherent feature of human voices. When perceiving speaker identities, listeners therefore need to not only âtell people apartâ (perceiving exemplars from two different speakers as separate identities) but also âtell people togetherâ (perceiving different exemplars from the same speaker as a single identity). In the current study, we investigated how such natural withinâperson variability affects voice identity perception. Using voices from a popular TV show, listeners, who were either familiar or unfamiliar with this show, sorted naturally varying voice clips from two speakers into clusters to represent perceived identities. Across three independent participant samples, unfamiliar listeners perceived more identities than familiar listeners and frequently mistook exemplars from the same speaker to be different identities. These findings point towards a selective failure in âtelling people togetherâ. Our study highlights withinâperson variability as a key feature of voices that has striking effects on (unfamiliar) voice identity perception. Our findings not only open up a new line of enquiry in the field of voice perception but also call for a reâevaluation of theoretical models to account for natural variability during identity perception
- âŠ