74 research outputs found
Cerebral correlates and statistical criteria of cross-modal face and voice integration
Perception of faces and voices plays a prominent role in human social interaction, making multisensory integration of cross-modal speech a topic of great interest in cognitive neuroscience. How to define po- tential sites of multisensory integration using functional magnetic resonance imaging (fMRI) is currently under debate, with three statistical criteria frequently used (e.g., super-additive, max and mean criteria). In the present fMRI study, 20 participants were scanned in a block design under three stimulus conditions: dynamic unimodal face, unimodal voice and bimodal faceâvoice. Using this single dataset, we examine all these statistical criteria in an attempt to define loci of faceâvoice integration. While the super-additive and mean criteria essentially revealed regions in which one of the unimodal responses was a deactivation, the max criterion appeared stringent and only highlighted the left hippocampus as a potential site of faceâ voice integration. Psychophysiological interaction analysis showed that connectivity between occipital and temporal cortices increased during bimodal compared to unimodal conditions. We concluded that, when investigating multisensory integration with fMRI, all these criteria should be used in conjunction with ma- nipulation of stimulus signal-to-noise ratio and/or cross-modal congruency
Dissociating task difficulty from incongruence in face-voice emotion integration
In the everyday environment, affective information is conveyed by both the face and the voice. Studies have demonstrated that a concurrently presented voice can alter the way that an emotional face expression is perceived, and vice versa, leading to emotional conflict if the information in the two modalities is mismatched. Additionally, evidence suggests that incongruence of emotional valence activates cerebral networks involved in conflict monitoring and resolution. However, it is currently unclear whether this is due to task difficultyâthat incongruent stimuli are harder to categorizeâor simply to the detection of mismatching information in the two modalities. The aim of the present fMRI study was to examine the neurophysiological correlates of processing incongruent emotional information, independent of task difficulty. Subjects were scanned while judging the emotion of face-voice affective stimuli. Both the face and voice were parametrically morphed between anger and happiness and then paired in all audiovisual combinations, resulting in stimuli each defined by two separate values: the degree of incongruence between the face and voice, and the degree of clarity of the combined face-voice information. Due to the specific morphing procedure utilized, we hypothesized that the clarity value, rather than incongruence value, would better reflect task difficulty. Behavioral data revealed that participants integrated face and voice affective information, and that the clarity, as opposed to incongruence value correlated with categorization difficulty. Cerebrally, incongruence was more associated with activity in the superior temporal region, which emerged after task difficulty had been accounted for. Overall, our results suggest that activation in the superior temporal region in response to incongruent information cannot be explained simply by task difficulty, and may rather be due to detection of mismatching information between the two modalities
Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields:A simulation study
Background
In recent years, analyses of event related potentials/fields have moved from the selection of a few components and peaks to a mass-univariate approach in which the whole data space is analyzed. Such extensive testing increases the number of false positives and correction for multiple comparisons is needed.
Method
Here we review all cluster-based correction for multiple comparison methods (cluster-height, cluster-size, cluster-mass, and threshold free cluster enhancement â TFCE), in conjunction with two computational approaches (permutation and bootstrap).
Results
Data driven Monte-Carlo simulations comparing two conditions within subjects (two sample Student's t-test) showed that, on average, all cluster-based methods using permutation or bootstrap alike control well the family-wise error rate (FWER), with a few caveats.
Conclusions
(i) A minimum of 800 iterations are necessary to obtain stable results; (ii) below 50 trials, bootstrap methods are too conservative; (iii) for low critical family-wise error rates (e.g. p = 1%), permutations can be too liberal; (iv) TFCE controls best the type 1 error rate with an attenuated extent parameter (i.e. power < 1)
How do you say âhelloâ? Personality impressions from brief novel voices
On hearing a novel voice, listeners readily form personality impressions of that speaker. Accurate or not, these impressions are known to affect subsequent interactions; yet the underlying psychological and acoustical bases remain poorly understood. Furthermore, hitherto studies have focussed on extended speech as opposed to analysing the instantaneous impressions we obtain from first experience. In this paper, through a mass online rating experiment, 320 participants rated 64 sub-second vocal utterances of the word âhelloâ on one of 10 personality traits. We show that: (1) personality judgements of brief utterances from unfamiliar speakers are consistent across listeners; (2) a two-dimensional âsocial voice spaceâ with axes mapping Valence (Trust, Likeability) and Dominance, each driven by differing combinations of vocal acoustics, adequately summarises ratings in both male and female voices; and (3) a positive combination of Valence and Dominance results in increased perceived male vocal Attractiveness, whereas perceived female vocal Attractiveness is largely controlled by increasing Valence. Results are discussed in relation to the rapid evaluation of personality and, in turn, the intent of others, as being driven by survival mechanisms via approach or avoidance behaviours. These findings provide empirical bases for predicting personality impressions from acoustical analyses of short utterances and for generating desired personality impressions in artificial voices
Listeners form average-based representations of individual voice identities.
Models of voice perception propose that identities are encoded relative to an abstracted average or prototype. While there is some evidence for norm-based coding when learning to discriminate different voices, little is known about how the representation of an individual's voice identity is formed through variable exposure to that voice. In two experiments, we show evidence that participants form abstracted representations of individual voice identities based on averages, despite having never been exposed to these averages during learning. We created 3 perceptually distinct voice identities, fully controlling their within-person variability. Listeners first learned to recognise these identities based on ring-shaped distributions located around the perimeter of within-person voice spaces - crucially, these distributions were missing their centres. At test, listeners' accuracy for old/new judgements was higher for stimuli located on an untrained distribution nested around the centre of each ring-shaped distribution compared to stimuli on the trained ring-shaped distribution
Cerebral processing of voice gender studied using a continuous carryover fMRI design
Normal listeners effortlessly determine a person's gender by voice, but the cerebral mechanisms underlying this ability remain unclear. Here, we demonstrate 2 stages of cerebral processing during voice gender categorization. Using voice morphing along with an adaptation-optimized functional magnetic resonance imaging design, we found that secondary auditory cortex including the anterior part of the temporal voice areas in the right hemisphere responded primarily to acoustical distance with the previously heard stimulus. In contrast, a network of bilateral regions involving inferior prefrontal and anterior and posterior cingulate cortex reflected perceived stimulus ambiguity. These findings suggest that voice gender recognition involves neuronal populations along the auditory ventral stream responsible for auditory feature extraction, functioning in pair with the prefrontal cortex in voice gender perception
The effects of stimulus complexity on the preattentive processing of self-generated and nonself voices: an ERP study
The ability to differentiate one's own voice from the voice of somebody else plays a critical role in successful verbal self-monitoring processes and in communication. However, most of the existing studies have only focused on the sensory correlates of self-generated voice processing, whereas the effects of attentional demands and stimulus complexity on self-generated voice processing remain largely unknown. In this study, we investigated the effects of stimulus complexity on the preattentive processing of self and nonself voice stimuli. Event-related potentials (ERPs) were recorded from 17 healthy males who watched a silent movie while ignoring prerecorded self-generated (SGV) and nonself (NSV) voice stimuli, consisting of a vocalization (vocalization category condition: VCC) or of a disyllabic word (word category condition: WCC). All voice stimuli were presented as standard and deviant events in four distinct oddball sequences. The mismatch negativity (MMN) ERP component peaked earlier for NSV than for SGV stimuli. Moreover, when compared with SGV stimuli, the P3a amplitude was increased for NSV stimuli in the VCC only, whereas in the WCC no significant differences were found between the two voice types. These findings suggest differences in the time course of automatic detection of a change in voice identity. In addition, they suggest that stimulus complexity modulates the magnitude of the orienting response to SGV and NSV stimuli, extending previous findings on self-voice processing.This work was supported by Grant Numbers IF/00334/2012, PTDC/PSI-PCL/116626/2010, and PTDC/MHN-PCN/3606/2012, funded by the Fundacao para a Ciencia e a Tecnologia (FCT, Portugal) and the Fundo Europeu de Desenvolvimento Regional through the European programs Quadro de Referencia Estrategico Nacional and Programa Operacional Factores de Competitividade, awarded to A.P.P., and by FCT Doctoral Grant Number SFRH/BD/77681/2011, awarded to T.C.info:eu-repo/semantics/publishedVersio
The Glasgow Voice Memory Test: Assessing the ability to memorize and recognize unfamiliar voices
One thousand one hundred and twenty subjects as well as a developmental phonagnosic subject (KH) along with age-matched controls performed the Glasgow Voice Memory Test, which assesses the ability to encode and immediately recognize, through an old/new judgment, both unfamiliar voices (delivered as vowels, making language requirements minimal) and bell sounds. The inclusion of non-vocal stimuli
allows the detection of significant dissociations between the two categories (vocal vs. non-vocal stimuli). The distributions of accuracy and sensitivity scores (dâ) reflected a wide range of individual differences in voice recognition performance in the population. As expected, KH showed a dissociation between the recognition of voices and bell sounds, her performance being significantly poorer than matched controls for voices but not for bells. By providing normative data of a large sample and by testing a developmental phonagnosic subject, we demonstrated that the Glasgow Voice Memory Test, available online and accessible fromall over the world, can be a valid screening tool (~5 min) for a preliminary detection of potential cases of phonagnosia and of âsuper recognizersâ for voices
Are super-face-recognisers also super-voice-recognisers? Evidence from cross-modal identification tasks
Individual differences in face identification ability range from prosopagnosia to super-recognition. The current study examined whether face identification ability predicts voice identification ability (participants: N = 529). Superior-face-identifiers (exceptional at face memory and matching), superior-face-recognisers (exceptional at face memory only), superior-face-matchers (exceptional face matchers only), and controls completed the Bangor Voice Matching Test, Glasgow Voice Memory Test, and a Famous Voice Recognition Test. Meeting predictions, those possessing exceptional face memory and matching skills outperformed typical-range face groups at voice memory and voice matching respectively. Proportionally more super-face-identifiers also achieved our super-voice-recogniser criteria on two or more tests. Underlying cross-modality (voices vs. faces) and cross-task (memory vs. perception) mechanisms may therefore drive superior performances. Dissociations between Glasgow Voice Memory Test voice and bell recognition also suggest voice-specific effects to match those found with faces. These findings have applied implications for policing, particularly in cases when only suspect voice clips are available
Summary statistics in auditory perception
Sensory signals are transduced at high resolution, but their structure must be stored in a more compact format. Here we provide evidence that the auditory system summarizes the temporal details of sounds using time-averaged statistics. We measured discrimination of 'sound textures' that were characterized by particular statistical properties, as normally result from the superposition of many acoustic features in auditory scenes. When listeners discriminated examples of different textures, performance improved with excerpt duration. In contrast, when listeners discriminated different examples of the same texture, performance declined with duration, a paradoxical result given that the information available for discrimination grows with duration. These results indicate that once these sounds are of moderate length, the brain's representation is limited to time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration. Such statistical representations produce good categorical discrimination, but limit the ability to discern temporal detail.Howard Hughes Medical Institut
- âŠ