429 research outputs found

    Sensorimotor experience in virtual environments

    Get PDF
    The goal of rehabilitation is to reduce impairment and provide functional improvements resulting in quality participation in activities of life, Plasticity and motor learning principles provide inspiration for therapeutic interventions including movement repetition in a virtual reality environment, The objective of this research work was to investigate functional specific measurements (kinematic, behavioral) and neural correlates of motor experience of hand gesture activities in virtual environments stimulating sensory experience (VE) using a hand agent model. The fMRI compatible Virtual Environment Sign Language Instruction (VESLI) System was designed and developed to provide a number of rehabilitation and measurement features, to identify optimal learning conditions for individuals and to track changes in performance over time. Therapies and measurements incorporated into VESLI target and track specific impairments underlying dysfunction. The goal of improved measurement is to develop targeted interventions embedded in higher level tasks and to accurately track specific gains to understand the responses to treatment, and the impact the response may have upon higher level function such as participation in life. To further clarify the biological model of motor experiences and to understand the added value and role of virtual sensory stimulation and feedback which includes seeing one\u27s own hand movement, functional brain mapping was conducted with simultaneous kinematic analysis in healthy controls and in stroke subjects. It is believed that through the understanding of these neural activations, rehabilitation strategies advantaging the principles of plasticity and motor learning will become possible. The present research assessed successful practice conditions promoting gesture learning behavior in the individual. For the first time, functional imaging experiments mapped neural correlates of human interactions with complex virtual reality hands avatars moving synchronously with the subject\u27s own hands, Findings indicate that healthy control subjects learned intransitive gestures in virtual environments using the first and third person avatars, picture and text definitions, and while viewing visual feedback of their own hands, virtual hands avatars, and in the control condition, hidden hands. Moreover, exercise in a virtual environment with a first person avatar of hands recruited insular cortex activation over time, which might indicate that this activation has been associated with a sense of agency. Sensory augmentation in virtual environments modulated activations of important brain regions associated with action observation and action execution. Quality of the visual feedback was modulated and brain areas were identified where the amount of brain activation was positively or negatively correlated with the visual feedback, When subjects moved the right hand and saw unexpected response, the left virtual avatar hand moved, neural activation increased in the motor cortex ipsilateral to the moving hand This visual modulation might provide a helpful rehabilitation therapy for people with paralysis of the limb through visual augmentation of skills. A model was developed to study the effects of sensorimotor experience in virtual environments, and findings of the effect of sensorimotor experience in virtual environments upon brain activity and related behavioral measures. The research model represents a significant contribution to neuroscience research, and translational engineering practice, A model of neural activations correlated with kinematics and behavior can profoundly influence the delivery of rehabilitative services in the coming years by giving clinicians a framework for engaging patients in a sensorimotor environment that can optimally facilitate neural reorganization

    Adults imitate to send a social signal

    Get PDF
    Humans are prolific imitators, even when copying may not be efficient. A variety of explanations have been advanced for this phenomenon, including that it is a side-effect of learning, that it arises from a lack of understanding of causality, to imitation being a mechanism to boost affiliation. This thesis systematically outlines the hypothesis that imitation is a social signal sent between interacting partners, which rests on testing whether our propensity to imitate is modulated by the social availability of the interaction partner (i.e., whether our interaction partner is watching us or not). I developed a dyadic block-moving paradigm that allowed us to test this hypothesis in a naturalistic manner in four behavioural and neuroimaging studies using functional near-infrared spectroscopy (fNIRS). I found that imitative fidelity was modulated by whether the interaction partner was watching the participant make their move or not, and this effect replicated across all four studies, in both neurotypicals and autistic participants. I also examined the neural correlates of responding to irrational actions, and of being watched. I found that being watched led to a robust deactivation in the right parietal cortex across both neurotypicals (in two studies) and autistic participants (one study). Among autistic participants we also found strong engagement in the left superior temporal sulcus (STS) when being watched. For responding to irrational actions, in one study of neurotypicals we found greater deactivation in the right superior parietal lobule (SPL) when making more irrational responses. In another study of autistic and neurotypical participants we found deactivation in the bilateral inferior parietal cortex (IPL) in neurotypicals when responding to irrational actions, while this deactivation appeared confined to the left IPL for autistic participants. Autistic participants also showed differentially higher engagement in the left occipitotemporal regions when responding to irrational actions. This thesis supports the social-signalling hypothesis of imitation and is accompanied by suggestions for future directions to explore this theory in more detail

    Mapping the development of visual information use for facial expression recognition

    Get PDF
    Dans cette thèse, je souhaitais cartographier le développement de la reconnaissance des expressions faciales de la petite enfance à l'âge adulte en identifiant, et ceci pour la première fois dans la littérature développementale, la quantité et la qualité d’informations visuelles nécessaires pour reconnaître les six émotions « de base ». En utilisant des mesures comportementales et oculaires, les contributions originales de cette thèse incluent: 1. Une cartographie fine et impartiale du développement continu de la reconnaissance des six expressions faciales de base avec l'introduction d'une mesure psychophysique de pointe; 2. L'identification de deux phases principales dans le développement de la reconnaissance des expressions faciales, allant de 5 à 12 ans et de 13 à l'âge adulte; 3. Une évaluation fine de la quantité d'informations (signal) et d'intensité nécessaires pour reconnaître les six émotions fondamentales du développement ; 4. Le traitement des informations relatives au signal et à l'intensité devient plus discriminant au cours du développement, car avec l'âge, moins d'informations sont nécessaires pour reconnaître la colère, le dégoût, la surprise et la tristesse. 5. Une nouvelle analyse des profils de réponse (la séquence de réponses entre les essais) a révélé des changements subtils mais importants dans la séquence de réponses sur un continuum d'âge: les profils deviennent plus similaires avec l'âge en raison de catégorisations erronées moins aléatoires; 6. La comparaison de deux mesures de reconnaissance au sein de la même cohorte, révélant que deux types de stimuli couramment utilisés dans les études sur les expressions émotionnelles (expressions à intensité maximale vs expressions d'intensités variables) ne peuvent pas être directement comparés au cours du développement; 7. De nouvelles analyses des mouvements oculaires ont révélé l'âge auquel les stratégies perceptuelles pour la reconnaissance d'expressions faciales émotionnelles deviennent matures. Une première revue de la littérature a révélé plusieurs domaines moins étudiés du développement de la reconnaissance de l'expression faciale, sur lesquels j'ai choisi de me concentrer pour ma thèse. Tout d'abord, au début de cette thèse, aucune étude n'a été menée sur le développement continu de la reconnaissance des expressions faciales depuis la petite enfance jusqu'à l'âge adulte. De même, aucune étude n’a examiné les six expressions dites «de base» et une expression neutre dans le même paradigme. Par conséquent, l’objectif de la première étude était de fournir une cartographie fine du développement continu des six expressions de base et neutre de l’âge de 5 ans à l’âge adulte en introduisant une nouvelle méthode psychophysique dans la littérature sur le développement. La procédure psychophysique adaptatived a fourni une mesure précise de la performance de reconnaissance à travers le développement. En utilisant une régression linéaire, nous avons ensuite tracé les trajectoires de développement pour la reconnaissance de chacune des 6 émotions de base et neutres. Cette cartographie de la reconnaissance à travers le développement a révélé des expressions qui montraient une nette amélioration avec l'âge - dégoût, neutre et colère; des expressions qui montrent une amélioration graduelle avec l’âge - tristesse, surprise; et celles qui sont restés stables depuis leur plus tendre enfance - la joie et la peur; indiquant que le codage de ces expressions est déjà mature à 5 ans. Deux phases principales ont été identifiées dans le développement de la reconnaissance des expressions faciales, car les seuils de reconnaissance étaient les plus similaires entre les âges de 5 à 12 ans et de 13 ans jusqu'à l'âge adulte. Dans la deuxième étude, nous voulions approfondir cette cartographie fine du développement de la reconnaissance des expressions faciales en quantifiant la quantité d'informations visuelles nécessaires pour reconnaître une expression au cours du développement en comparant deux mesures d'informations visuelles, le signal et l'intensité. Encore une fois, en utilisant une approche psychophysique, cette fois avec un plan de mesures répétées, la quantité de signal et l'intensité nécessaires pour reconnaître les expressions de tristesse, colère, dégoût et surprise ont diminué avec l'âge. Par conséquent, le traitement des deux types d’informations visuelles devient plus discriminant au cours du développement car moins d’informations sont nécessaires avec l’âge pour reconnaître ces expressions. L'analyse mutuelle des informations a révélé que l'intensité et le traitement du signal ne sont similaires qu'à l'âge adulte et que, par conséquent, les expressions à intensité maximale (dans la condition du signal) et les expressions d'intensité variable (dans la condition d'intensité) ne peuvent être comparées directement pendant le développement. Alors que les deux premières études de cette thèse traitaient de la quantité d'informations visuelles nécessaires pour reconnaître une expression tout au long du développement, le but de la troisième étude était de déterminer quelle information est utilisée dans le développement pour reconnaître une expression utilisant l'eye-tracking. Nous avons enregistré les mouvements oculaires d’enfants âgés de 5 ans à l'âge adulte lors de la reconnaissance des six émotions de base en utilisant des conditions de vision naturelles et des conditions contingentes du regard. L'analyse statistique multivariée des données sur les mouvements oculaires au cours du développement a révélé l'âge auquel les stratégies perceptuelles pour la reconnaissance des expressions faciales des émotions deviennent matures. Les stratégies de mouvement oculaire du groupe d'adolescents les plus âgés, 17 à 18 ans, étaient les plus similaires aux adultes, quelle que soit leur expression. Une dépression dans le développement de la similarité stratégique avec les adultes a été trouvé pour chaque expression émotionnelle entre 11 et 14 ans et légèrement avant, entre 7 et 8 ans, pour la joie. Enfin, la précision de la reconnaissance des expressions de joie, colère et tristesse ne diffère pas d’un groupe d’âge à l’autre, mais les stratégies des mouvements oculaires divergent, ce qui indique que diverses approches sont possibles pour atteindre une performance optimale. En résumé, les études cartographient les trajectoires complexes et non uniformes du développement de la reconnaissance des expressions faciales en comparant l'utilisation des informations visuelles depuis la petite enfance jusqu'à l'âge adulte. Les études montrent non seulement dans quelle mesure la reconnaissance des expressions faciales se développe avec l’âge, mais aussi comment cette expression est obtenue tout au long du développement en déterminant si les stratégies perceptuelles sont similaires à travers les âges et à quel stade elles peuvent être considérées comme matures. Les études visaient à fournir la base d’une compréhension du développement continu de la reconnaissance des expressions faciales, qui faisait auparavant défaut dans la littérature. Les travaux futurs visent à approfondir cette compréhension en examinant comment la reconnaissance des expressions se développe en relation avec d'autres aspects du traitement cognitif et émotionnel ce qui pourrait permettre d'éclaircir si des aspects neuro-développementaux seraient à l’origine de la dépression présente entre 7-8 et 11-14 ans lorsque l’on compare les stratégies de fixations des enfants à celles des adultes.In this thesis, I aimed to map the development of facial expression recognition from early childhood up to adulthood by identifying for the first time in the literature the quantity and quality of visual information needed to recognise the six 'basic' emotions. Using behavioural and eye tracking measures, the original contributions of this thesis include: 1. An unbiased fine-grained mapping of the continued development of facial expression recognition for the six basic emotions with the introduction of a psychophysical measure to the literature; 2. The identification of two main phases in the development of facial expression recognition, ranging from 5 to 12 years old and 13 years old to adulthood; 3. The quantity of signal and intensity information needed to recognise the six basic emotions across development; 4. The processing of signal and intensity information becomes more discriminative during development as less information is needed with age to recognise anger, disgust, surprise and sadness; 5. Novel analysis of response profiles (the sequence of responses across trials) revealed subtle but important changes in the sequence of responses along a continuum of age - profiles become more similar with age due to less random erroneous categorizations; 6. The comparison of two recognition measures across the same cohort revealing that two types of stimuli commonly used in facial emotion processing studies (expressions at full intensity vs. expressions of varying intensities) cannot be straightforwardly compared during development; 7. Novel eye movement analyses revealed the age at which perceptual strategies for the recognition of facial expressions of emotion become mature. An initial review of the literature revealed several less studied areas of the development of facial expression recognition, which I chose to focus on for my thesis. Firstly, at the outset of this thesis there were no studies of the continued development of facial expression recognition from early childhood up to adulthood. Similarly, there were no studies which examined all six of, what are termed, the 'basic emotions' and a neutral expression within the same paradigm. Therefore, the objective of the first study was to provide a fine-grained mapping of the continued development for all six basic expressions and neutral from the age of 5 up to adulthood by introducing a novel psychophysical method to the developmental literature. The psychophysical adaptive staircase procedure provided a precise measure of recognition performance across development. Using linear regression, we then charted the developmental trajectories for recognition of each of the 6 basic emotions and neutral. This mapping of recognition across development revealed expressions that showed a steep improvement with age – disgust, neutral, and anger; expressions that showed a more gradual improvement with age – sadness, surprise; and those that remained stable from early childhood – happiness and fear; indicating that the coding for these expressions is already mature by 5 years of age. Two main phases were identified in the development of facial expression recognition as recognition thresholds were most similar between the ages of 5 to 12 and 13 to adulthood. In the second study we aimed to take this fine-grained mapping of the development of facial expression recognition further by quantifying how much visual information is needed to recognise an expression across development by comparing two measures of visual information, signal and intensity. Again, using a psychophysical approach, this time with a repeated measures design, the quantity of signal and intensity needed to recognise sad, angry, disgust, and surprise expressions decreased with age. Therefore, the processing of both types of visual information becomes more discriminative during development as less information is needed with age to recognize these expressions. Mutual information analysis revealed that intensity and signal processing are similar only during adulthood and, therefore, expressions at full intensity (as in the signal condition) and expressions of varying intensities (as in the intensity condition) cannot be straightforwardly compared during development. While the first two studies of this thesis addressed how much visual information is needed to recognise an expression across development, the aim of the third study was to investigate which information is used across development to recognise an expression using eye-tracking. We recorded the eye movements of children from the age of 5 up to adulthood during recognition of the six basic emotions using natural viewing and gaze-contingent conditions. Multivariate statistical analysis of the eye movement data across development revealed the age at which perceptual strategies for the recognition of facial expressions of emotion become mature. The eye movement strategies of the oldest adolescent group, 17- to 18-year-olds, were most similar to adults for all expressions. A developmental dip in strategy similarity to adults was found for each emotional expression between 11- to 14-years, and slightly earlier, 7- to 8-years, for happiness. Finally, recognition accuracy for happy, angry, and sad expressions did not differ across age groups but eye movement strategies diverged, indicating that diverse approaches are possible for reaching optimal performance. In sum, the studies map the intricate and non-uniform trajectories of the development of facial expression recognition by comparing visual information use from early childhood up to adulthood. The studies chart not only how well recognition of facial expressions develops with age, but also how facial expression recognition is achieved throughout development by establishing whether perceptual strategies are similar across age and at what stage they can be considered mature. The studies aimed to provide the basis of an understanding of the continued development of facial expression recognition which was previously lacking from the literature. Future work aims to further this understanding by investigating how facial expression recognition develops in relation to other aspects of cognitive and emotional processing and to investigate the potential neurodevelopmental basis of the developmental dip found in fixation strategy similarity

    Trennung und Schätzung der Anzahl von Audiosignalquellen mit Zeit- und Frequenzüberlappung

    Get PDF
    Everyday audio recordings involve mixture signals: music contains a mixture of instruments; in a meeting or conference, there is a mixture of human voices. For these mixtures, automatically separating or estimating the number of sources is a challenging task. A common assumption when processing mixtures in the time-frequency domain is that sources are not fully overlapped. However, in this work we consider some cases where the overlap is severe — for instance, when instruments play the same note (unison) or when many people speak concurrently ("cocktail party") — highlighting the need for new representations and more powerful models. To address the problems of source separation and count estimation, we use conventional signal processing techniques as well as deep neural networks (DNN). We first address the source separation problem for unison instrument mixtures, studying the distinct spectro-temporal modulations caused by vibrato. To exploit these modulations, we developed a method based on time warping, informed by an estimate of the fundamental frequency. For cases where such estimates are not available, we present an unsupervised model, inspired by the way humans group time-varying sources (common fate). This contribution comes with a novel representation that improves separation for overlapped and modulated sources on unison mixtures but also improves vocal and accompaniment separation when used as an input for a DNN model. Then, we focus on estimating the number of sources in a mixture, which is important for real-world scenarios. Our work on count estimation was motivated by a study on how humans can address this task, which lead us to conduct listening experiments, confirming that humans are only able to estimate the number of up to four sources correctly. To answer the question of whether machines can perform similarly, we present a DNN architecture, trained to estimate the number of concurrent speakers. Our results show improvements compared to other methods, and the model even outperformed humans on the same task. In both the source separation and source count estimation tasks, the key contribution of this thesis is the concept of “modulation”, which is important to computationally mimic human performance. Our proposed Common Fate Transform is an adequate representation to disentangle overlapping signals for separation, and an inspection of our DNN count estimation model revealed that it proceeds to find modulation-like intermediate features.Im Alltag sind wir von gemischten Signalen umgeben: Musik besteht aus einer Mischung von Instrumenten; in einem Meeting oder auf einer Konferenz sind wir einer Mischung menschlicher Stimmen ausgesetzt. Für diese Mischungen ist die automatische Quellentrennung oder die Bestimmung der Anzahl an Quellen eine anspruchsvolle Aufgabe. Eine häufige Annahme bei der Verarbeitung von gemischten Signalen im Zeit-Frequenzbereich ist, dass die Quellen sich nicht vollständig überlappen. In dieser Arbeit betrachten wir jedoch einige Fälle, in denen die Überlappung immens ist zum Beispiel, wenn Instrumente den gleichen Ton spielen (unisono) oder wenn viele Menschen gleichzeitig sprechen (Cocktailparty) —, so dass neue Signal-Repräsentationen und leistungsfähigere Modelle notwendig sind. Um die zwei genannten Probleme zu bewältigen, verwenden wir sowohl konventionelle Signalverbeitungsmethoden als auch tiefgehende neuronale Netze (DNN). Wir gehen zunächst auf das Problem der Quellentrennung für Unisono-Instrumentenmischungen ein und untersuchen die speziellen, durch Vibrato ausgelösten, zeitlich-spektralen Modulationen. Um diese Modulationen auszunutzen entwickelten wir eine Methode, die auf Zeitverzerrung basiert und eine Schätzung der Grundfrequenz als zusätzliche Information nutzt. Für Fälle, in denen diese Schätzungen nicht verfügbar sind, stellen wir ein unüberwachtes Modell vor, das inspiriert ist von der Art und Weise, wie Menschen zeitveränderliche Quellen gruppieren (Common Fate). Dieser Beitrag enthält eine neuartige Repräsentation, die die Separierbarkeit für überlappte und modulierte Quellen in Unisono-Mischungen erhöht, aber auch die Trennung in Gesang und Begleitung verbessert, wenn sie in einem DNN-Modell verwendet wird. Im Weiteren beschäftigen wir uns mit der Schätzung der Anzahl von Quellen in einer Mischung, was für reale Szenarien wichtig ist. Unsere Arbeit an der Schätzung der Anzahl war motiviert durch eine Studie, die zeigt, wie wir Menschen diese Aufgabe angehen. Dies hat uns dazu veranlasst, eigene Hörexperimente durchzuführen, die bestätigten, dass Menschen nur in der Lage sind, die Anzahl von bis zu vier Quellen korrekt abzuschätzen. Um nun die Frage zu beantworten, ob Maschinen dies ähnlich gut können, stellen wir eine DNN-Architektur vor, die erlernt hat, die Anzahl der gleichzeitig sprechenden Sprecher zu ermitteln. Die Ergebnisse zeigen Verbesserungen im Vergleich zu anderen Methoden, aber vor allem auch im Vergleich zu menschlichen Hörern. Sowohl bei der Quellentrennung als auch bei der Schätzung der Anzahl an Quellen ist ein Kernbeitrag dieser Arbeit das Konzept der “Modulation”, welches wichtig ist, um die Strategien von Menschen mittels Computern nachzuahmen. Unsere vorgeschlagene Common Fate Transformation ist eine adäquate Darstellung, um die Überlappung von Signalen für die Trennung zugänglich zu machen und eine Inspektion unseres DNN-Zählmodells ergab schließlich, dass sich auch hier modulationsähnliche Merkmale finden lassen

    The role of decision confidence in advice-taking and trust formation

    Full text link
    In a world where ideas flow freely between people across multiple platforms, we often find ourselves relying on others' information without an objective standard to judge whether those opinions are accurate. The present study tests an agreement-in-confidence hypothesis of advice perception, which holds that internal metacognitive evaluations of decision confidence play an important functional role in the perception and use of social information, such as peers' advice. We propose that confidence can be used, computationally, to estimate advisors' trustworthiness and advice reliability. Specifically, these processes are hypothesized to be particularly important in situations where objective feedback is absent or difficult to acquire. Here, we use a judge-advisor system paradigm to precisely manipulate the profiles of virtual advisors whose opinions are provided to participants performing a perceptual decision making task. We find that when advisors' and participants' judgments are independent, people are able to discriminate subtle advice features, like confidence calibration, whether or not objective feedback is available. However, when observers' judgments (and judgment errors) are correlated - as is the case in many social contexts - predictable distortions can be observed between feedback and feedback-free scenarios. A simple model of advice reliability estimation, endowed with metacognitive insight, is able to explain key patterns of results observed in the human data. We use agent-based modeling to explore implications of these individual-level decision strategies for network-level patterns of trust and belief formation

    Neural reflections of meaning in gesture, language, and action

    Get PDF

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Sequential grouping constraints on across-channel auditory processing

    Get PDF

    Features and Functions: Decomposing the Neural and Cognitive Bases of Semantic Composition

    Get PDF
    In this dissertation, I present a suite of studies investigating the neural and cognitive bases of semantic composition. First, I motivate why a theory of semantic combinatorics is a fundamental desideratum of the cognitive neuroscience of language. I then introduce a possible typology of semantic composition: one which involves contrasting feature-based composition with function-based composition. Having outlined several different ways we might operationalize such a distinction, I proceed to detail two studies using univariate and multivariate fMRI measures, each examining different dichotomies along which the feature-vs.-function distinction might cleave. I demonstrate evidence that activity in the angular gyrus indexes certain kinds of function-/relation-based semantic operations and may be involved in processing event semantics. These results provide the first targeted comparison of feature- and function-based semantic composition, particularly in the brain, and delineate what proves to be a productive typology of semantic combinatorial operations. The final study investigates a different question regarding semantic composition: namely, how automatic is the interpretation of plural events, and what information does the processor use when committing to either a distributive plural event (comprising separate events) or a collective plural event (consisting of a single joint event)

    The integration of paralinguistic information from the face and the voice

    Get PDF
    We live in a world which bombards us with a huge amount of sensory information, even if we are not always aware of it. To successfully navigate, function and ultimately survive in our environment we use all of the cues available to us. Furthermore, we actually combine this information: doing so allows us not only to construct a richer percept of the objects around us, but actually increases the reliability of our decisions and sensory estimates. However, at odds with our naturally multisensory awareness of our surroundings, the literature addressing unisensory processes has always far exceeded that which examines the multimodal nature of perception. Arguably the most salient and relevant stimuli in our environment are other people. Our species is not designed to operate alone, and so we have evolved to be especially skilled in all those things which enable effective social interaction – this could be engaging in conversation, but equally as well recognising a family member, or understanding the current emotional state of a friend, and adjusting our behaviour appropriately. In particular, the face and the voice both provide us with a wealth of hugely relevant social information - linguistic, but also non-linguistic. In line with work conducted in other fields of multisensory perception, research on face and voice perception has mainly concentrated on each of these modalities independently, particularly face perception. Furthermore, the work that has addressed integration of these two sources by and large has concentrated on the audiovisual nature of speech perception. The work in this thesis is based on a theoretical model of voice perception which not only proposed a serial processing pathway of vocal information, but also emphasised the similarities between face and voice processing, suggesting that this information may interact. Significantly, these interactions were not just confined to speech processing, but rather encompassed all forms of information processing, whether this was linguistic or paralinguistic. Therefore, in this thesis, I concentrate on the interactions between, and integration of face-voice paralinguistic information. In Chapter 3 we conducted a general investigation of neural face-voice integration. A number of studies have attempted to identify the cerebral regions in which information from the face and voice combines; however, in addition to a large number of regions being proposed as integration sites, it is not known whether these regions are selective in the binding of these socially relevant stimuli. We identified firstly regions in the bilateral superior temporal sulcus (STS) which showed an increased response to person-related information – whether this was faces, voices, or faces and voices combined – in comparison to information from objects. A subsection of this region in the right posterior superior temporal sulcus (pSTS) also produced a significantly stronger response to audiovisual as compared to unimodal information. We therefore propose this as a potential people-selective, integrative region. Furthermore, a large portion of the right pSTS was also observed to be people-selective and heteromodal: that is, both auditory and visual information provoked a significant response above baseline. These results underline the importance of the STS region in social communication. Chapter 4 moved on to study the audiovisual perception of gender. Using a set of novel stimuli – which were not only dynamic but also morphed in both modalities – we investigated whether different combinations of gender information in the face and voice could affect participants’ perception of gender. We found that participants indeed combined both sources of information when categorising gender, with their decision being reflective of information contained in both modalities. However, this combination was not entirely equal: in this experiment, gender information from the voice appeared to dominate over that from the face, exerting a stronger modulating effect on categorisation. This result was supported by the findings from conditions which directed to attention, where we observed participants were able to ignore face but not voice information; and also reaction times results, where latencies were generally a reflection of voice morph. Overall, these results support interactions between face and voice in gender perception, but demonstrate that (due to a number of probable factors) one modality can exert more influence than another. Finally, in Chapter 5 we investigated the proposed interactions between affective content in the face and voice. Specifically, we used a ‘continuous carry-over’ design – again in conjunction with dynamic, morphed stimuli – which allowed us to investigate not only ‘direct’ effects of different sets of audiovisual stimuli (e.g., congruent, incongruent), but also adaptation effects (in particular, the effect of emotion expressed in one modality upon the response to emotion expressed in another modality). Parallel to behavioural results, which showed that the crossmodal context affected the time taken to categorise emotion, we observed a significant crossmodal effect in the right pSTS, which was independent of any within-modality adaptation. We propose that this result provides strong evidence that this region may be composed of similarly multisensory neurons, as opposed to two sets of interdigitised neurons responsive to information from one modality or the other. Furthermore, an analysis investigating stimulus congruence showed that the degree of incongruence modulated activity across the right STS, further inferring neural response in this region can be altered depending on the particular combination of affective information contained within the face and voice. Overall, both behavioural and cerebral results from this study suggested that participants integrated emotion from the face and voice
    corecore