1,307 research outputs found
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
Recommended from our members
Emotion recognition in the human face and voice
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonAt a perceptual level, faces and voices consist of very different sensory inputs and therefore, information processing from one modality can be independent of information processing from another modality (Adolphs & Tranel, 1999). However, there may also be a shared neural emotion network that processes stimuli independent of modality (Peelen, Atkinson, & Vuilleumier, 2010) or emotions may be processed on a more abstract cognitive level, based on meaning rather than on perceptual signals. This thesis therefore aimed to examine emotion recognition across two separate modalities in a within-subject design, including a cognitive Chapter 1 with 45 British adults, a developmental Chapter 2 with 54 British children as well as a cross-cultural Chapter 3 with 98 German and British children, and 78 German and British adults. Intensity ratings as well as choice reaction times and correlations of confusion analyses of emotions across modalities were analysed throughout. Further, an ERP Chapter investigated the time-course of emotion recognition across two modalities. Highly correlated rating profiles of emotions in faces and voices were found which suggests a similarity in emotion recognition across modalities. Emotion recognition in primary-school children improved with age for both modalities although young children relied mainly on faces. British as well as German participants showed comparable patterns for rating basic emotions, but subtle differences were also noted and Germans perceived emotions as less intense than British. Overall, behavioural results reported in the present thesis are consistent with the idea of a general, more abstract level of emotion processing which may act independently of modality. This could be based, for example, on a shared emotion brain network or some more general, higher-level cognitive processes which are activated across a range of modalities. Although emotion recognition abilities are already evident during childhood, this thesis argued for a contribution of ‘nurture’ to emotion mechanisms as recognition was influenced by external factors such as development and culture.Economics and Social Research Counci
Sound context modulates perceived vocal emotion
International audienceMany animal vocalizations contain nonlinear acoustic phenomena as a consequence of physiological arousal. In humans, nonlinear features are processed early in the auditory system, and are used to efficiently detect alarm calls and other urgent signals. Yet, high-level emotional and semantic contextual factors likely guide the perception and evaluation of roughness features in vocal sounds. Here we examined the relationship between perceived vocal arousal and auditory context. We presented listeners with nonverbal vocalizations (yells of a single vowel) at varying levels of portrayed vocal arousal, in two musical contexts (clean guitar, distorted guitar) and one non-musical context (modulated noise). As predicted, vocalizations with higher levels of portrayed vocal arousal were judged as more negative and more emotionally aroused than the same voices produced with low vocal arousal. Moreover, both the perceived valence and emotional arousal of vocalizations were significantly affected by both musical and non-musical contexts. These results show the importance of auditory context in judging emotional arousal and valence in voices and music, and suggest that nonlinear features in music are processed similarly to communicative vocal signals
The integration of paralinguistic information from the face and the voice
We live in a world which bombards us with a huge amount of sensory information, even if we are not always aware of it. To successfully navigate, function and ultimately survive in our environment we use all of the cues available to us. Furthermore, we actually combine this information: doing so allows us not only to construct a richer percept of the objects around us, but actually increases the reliability of our decisions and sensory estimates. However, at odds with our naturally multisensory awareness of our surroundings, the literature addressing unisensory processes has always far exceeded that which examines the multimodal nature of perception.
Arguably the most salient and relevant stimuli in our environment are other people. Our species is not designed to operate alone, and so we have evolved to be especially skilled in all those things which enable effective social interaction – this could be engaging in conversation, but equally as well recognising a family member, or understanding the current emotional state of a friend, and adjusting our behaviour appropriately. In particular, the face and the voice both provide us with a wealth of hugely relevant social information - linguistic, but also non-linguistic. In line with work conducted in other fields of multisensory perception, research on face and voice perception has mainly concentrated on each of these modalities independently, particularly face perception. Furthermore, the work that has addressed integration of these two sources by and large has concentrated on the audiovisual nature of speech perception.
The work in this thesis is based on a theoretical model of voice perception which not only proposed a serial processing pathway of vocal information, but also emphasised the similarities between face and voice processing, suggesting that this information may interact. Significantly, these interactions were not just confined to speech processing, but rather encompassed all forms of information processing, whether this was linguistic or paralinguistic. Therefore, in this thesis, I concentrate on the interactions between, and integration of face-voice paralinguistic information.
In Chapter 3 we conducted a general investigation of neural face-voice integration. A number of studies have attempted to identify the cerebral regions in which information from the face and voice combines; however, in addition to a large number of regions being proposed as integration sites, it is not known whether these regions are selective in the binding of these socially relevant stimuli. We identified firstly regions in the bilateral superior temporal sulcus (STS) which showed an increased response to person-related information – whether this was faces, voices, or faces and voices combined – in comparison to information from objects. A subsection of this region in the right posterior superior temporal sulcus (pSTS) also produced a significantly stronger response to audiovisual as compared to unimodal information. We therefore propose this as a potential people-selective, integrative region. Furthermore, a large portion of the right pSTS was also observed to be people-selective and heteromodal: that is, both auditory and visual information provoked a significant response above baseline. These results underline the importance of the STS region in social communication.
Chapter 4 moved on to study the audiovisual perception of gender. Using a set of novel stimuli – which were not only dynamic but also morphed in both modalities – we investigated whether different combinations of gender information in the face and voice could affect participants’ perception of gender. We found that participants indeed combined both sources of information when categorising gender, with their decision being reflective of information contained in both modalities. However, this combination was not entirely equal: in this experiment, gender information from the voice appeared to dominate over that from the face, exerting a stronger modulating effect on categorisation. This result was supported by the findings from conditions which directed to attention, where we observed participants were able to ignore face but not voice information; and also reaction times results, where latencies were generally a reflection of voice morph. Overall, these results support interactions between face and voice in gender perception, but demonstrate that (due to a number of probable factors) one modality can exert more influence than another.
Finally, in Chapter 5 we investigated the proposed interactions between affective content in the face and voice. Specifically, we used a ‘continuous carry-over’ design – again in conjunction with dynamic, morphed stimuli – which allowed us to investigate not only ‘direct’ effects of different sets of audiovisual stimuli (e.g., congruent, incongruent), but also adaptation effects (in particular, the effect of emotion expressed in one modality upon the response to emotion expressed in another modality). Parallel to behavioural results, which showed that the crossmodal context affected the time taken to categorise emotion, we observed a significant crossmodal effect in the right pSTS, which was independent of any within-modality adaptation. We propose that this result provides strong evidence that this region may be composed of similarly multisensory neurons, as opposed to two sets of interdigitised neurons responsive to information from one modality or the other. Furthermore, an analysis investigating stimulus congruence showed that the degree of incongruence modulated activity across the right STS, further inferring neural response in this region can be altered depending on the particular combination of affective information contained within the face and voice. Overall, both behavioural and cerebral results from this study suggested that participants integrated emotion from the face and voice
- …