2,931 research outputs found
A unified coding strategy for processing faces and voices
Both faces and voices are rich in socially-relevant information, which humans are remarkably adept at extracting, including a person's identity, age, gender, affective state, personality, etc. Here, we review accumulating evidence from behavioral, neuropsychological, electrophysiological, and neuroimaging studies which suggest that the cognitive and neural processing mechanisms engaged by perceiving faces or voices are highly similar, despite the very different nature of their sensory input. The similarity between the two mechanisms likely facilitates the multi-modal integration of facial and vocal information during everyday social interactions. These findings emphasize a parsimonious principle of cerebral organization, where similar computational problems in different modalities are solved using similar solutions
Silent reading of direct versus indirect speech activates voice-selective areas in the auditory cortex
In human communication, direct speech (e.g., Mary said: âI'm hungryâ) is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, for silent reading, the representational consequences of this distinction are still unclear. Although many of us share the intuition of an âinner voice,â particularly during silent reading of direct speech statements in text, there has been little direct empirical confirmation of this experience so far. Combining fMRI with eye tracking in human volunteers, we show that silent reading of direct versus indirect speech engenders differential brain activation in voice-selective areas of the auditory cortex. This suggests that readers are indeed more likely to engage in perceptual simulations (or spontaneous imagery) of the reported speaker's voice when reading direct speech as opposed to meaning-equivalent indirect speech statements as part of a more vivid representation of the former. Our results may be interpreted in line with embodied cognition and form a starting point for more sophisticated interdisciplinary research on the nature of auditory mental simulation during reading
Anatomo-functional correspondence in the superior temporal sulcus
The superior temporal sulcus (STS) is an intriguing region both for its complex anatomy and for the multiple functions that it hosts. Unfortunately, most studies explored either the functional organization or the anatomy of the STS only. Here, we link these two aspects by investigating anatomo-functional correspondences between the voice-sensitive cortex (Temporal Voice Areas) and the STS depth. To do so, anatomical and functional scans of 116 subjects were processed such as to generate individual surface maps on which both depth and functional voice activity can be analyzed. Individual depth profiles of manually drawn STS and functional profiles from a voice localizer (voice > non-voice) maps were extracted and compared to assess anatomo-functional correspondences. Three major results were obtained: first, the STS exhibits a highly significant rightward depth asymmetry in its middle part. Second, there is an anatomo-functional correspondence between the location of the voice-sensitive peak and the deepest point inside this asymmetrical region bilaterally. Finally, we showed that this correspondence was independent of the gender and, using a machine learning approach, that it existed at the individual level. These findings offer new perspectives for the understanding of anatomo-functional correspondences in this complex cortical region
Dissociating task difficulty from incongruence in face-voice emotion integration
In the everyday environment, affective information is conveyed by both the face and the voice. Studies have demonstrated that a concurrently presented voice can alter the way that an emotional face expression is perceived, and vice versa, leading to emotional conflict if the information in the two modalities is mismatched. Additionally, evidence suggests that incongruence of emotional valence activates cerebral networks involved in conflict monitoring and resolution. However, it is currently unclear whether this is due to task difficultyâthat incongruent stimuli are harder to categorizeâor simply to the detection of mismatching information in the two modalities. The aim of the present fMRI study was to examine the neurophysiological correlates of processing incongruent emotional information, independent of task difficulty. Subjects were scanned while judging the emotion of face-voice affective stimuli. Both the face and voice were parametrically morphed between anger and happiness and then paired in all audiovisual combinations, resulting in stimuli each defined by two separate values: the degree of incongruence between the face and voice, and the degree of clarity of the combined face-voice information. Due to the specific morphing procedure utilized, we hypothesized that the clarity value, rather than incongruence value, would better reflect task difficulty. Behavioral data revealed that participants integrated face and voice affective information, and that the clarity, as opposed to incongruence value correlated with categorization difficulty. Cerebrally, incongruence was more associated with activity in the superior temporal region, which emerged after task difficulty had been accounted for. Overall, our results suggest that activation in the superior temporal region in response to incongruent information cannot be explained simply by task difficulty, and may rather be due to detection of mismatching information between the two modalities
Effects of emotional valence and arousal on the voice perception network
Several theories conceptualise emotions along two main dimensions: valence (a continuum from negative to positive) and arousal (a continuum that varies from low to high). These dimensions are typically treated as independent in many neuroimaging experiments, yet recent behavioural findings suggest that they are actually interdependent. This result has impact on neuroimaging design, analysis and theoretical development. We were interested in determining the extent of this interdependence both behaviourally and neuroanatomically, as well as teasing apart any activation that is specific to each dimension. While we found extensive overlap in activation for each dimension in traditional emotion areas (bilateral insulae, orbitofrontal cortex, amygdalae), we also found activation specific to each dimension with characteristic relationships between modulations of these dimensions and BOLD signal change. Increases in arousal ratings were related to increased activations predominantly in voice-sensitive cortices after variance explained by valence had been removed. In contrast, emotions of extreme valence were related to increased activations in bilateral voice-sensitive cortices, hippocampi, anterior and midcingulum and medial orbito- and superior frontal regions after variance explained by arousal had been accounted for. Our results therefore do not support a complete segregation of brain structures underpinning the processing of affective dimensions
A neural marker for social bias towards in-group accents
Accents provide information about the speaker's geographical, socio-economic, and ethnic background. Research in applied psychology and sociolinguistics suggests that we generally prefer our own accent to other varieties of our native language and attribute more positive traits to it. Despite the widespread influence of accents on social interactions, educational and work settings the neural underpinnings of this social bias toward our own accent and, what may drive this bias, are unexplored. We measured brain activity while participants from two different geographical backgrounds listened passively to 3 English accent types embedded in an adaptation design. Cerebral activity in several regions, including bilateral amygdalae, revealed a significant interaction between the participants' own accent and the accent they listened to: while repetition of own accents elicited an enhanced neural response, repetition of the other group's accent resulted in reduced responses classically associated with adaptation. Our findings suggest that increased social relevance of, or greater emotional sensitivity to in-group accents, may underlie the own-accent bias. Our results provide a neural marker for the bias associated with accents, and show, for the first time, that the neural response to speech is partly shaped by the geographical background of the listener
Low-cost image annotation for supervised machine learning. Application to the detection of weeds in dense culture
An open problem in robotized agriculture is to detect weeds in dense culture. This problem can be addressed with computer vision and machine learning. The bottleneck of supervised approaches lay in the manual annotation of training images. We propose two different approaches for detecting weeds position to speed up this process. The first approach is using synthetic images and eye-tracking to annotated images [4] which is at least 30 times faster than manual annotation by an expert, the second approach is based on real RGB and depth images collected via Kinect v2 sensor.
We generated a data set of 150 synthetic images which weeds were randomly positioned on it. Images were gazed by two observers. Eye tracker sampled eye position during the execution of this task [5, 6]. Area of interest was recorded as rectangular patches. A patch is considered as including weeds if the average fixation time in this patch exceeds 1.04 seconds. The quality of visual annotation by eye-tracking is assessed by two ways. First, direct comparison of visual annotation with ground-truth which is shown an average 94.7% of all fixations on an image which fell within ground-truth bounding-boxes. Second, as shown in fig.1 eye-tracked annotated data is used as a training data set in four machine learning approaches and compare the recognition rate with the ground-truth.
These four machine learning methods are tested in order to assess the quality of the visual annotation. These methods correspond to handcrafted features adapted to texture characterization. They are followed by a linear support vector machine binary classifier. The table 1 gives the average accuracy and standard deviation. Experimental results prove that visual eye-tracked annotated data are almost the same as in-silico ground-truth and performances of supervised machine learning on eye-tracked annotated data are very close to the one obtained with ground-truth
- âŠ