Perception of emotion in social interactions from body movement and voice

Abstract

The central theme of this thesis was to examine different aspects related to the observation and judgement of emotions from the body movement and voice of two actors engaged in social interaction. There were four major goals related to this theme. The first goal was to create a novel stimulus set for the study of emotional social interactions. The second was to validate the created stimulus set by examining emotion perception in ways similar to that done with single actor displays. The third goal was to examine the effect of degrading visual and auditory information on the perception of emotional social interactions. The final goal was focused on the multimodal integration of emotional signals from body movement and voice. Initially, a stimulus set was created that incorporated body movement and dialogue between two actors in brief, natural interactions that were happy, angry or neutral at different levels of intensity. The stimulus set was captured using a Vicon motion and voice capture system and included a group of nine professional and non-professional actors. This resulted in a corpus of 756 dyadic, multimodal, emotional interactions. A series of experiments were conducted presenting participants with visual point-light displays, auditory voice dialogues or combinations of both visual and auditory displays. Observers could accurately identify happy and angry interactions from dyadic displays and voice. The intensity of expressions influenced the accuracy of the emotional identification but only for angry rather than happy displays. After validation of the stimulus set, a subset was selected for further studies. Various methods of auditory and visual distortion were tested separately for each modality to examine the effect of those distortions on recognition of emotions from body movement and voice. Results for dyadic point-light displays followed similar findings from single actor displays that inversion and scrambling decreased the overall accuracy of emotion judgements. An effect of viewpoint was also found, indicating that observation of interaction from a side viewpoint was easier for emotion detection than observation of interaction from an oblique viewpoint. In the case of voice, methods of brown noise and low-pass filtering were shown to degrade emotion identification. However, with both visual and auditory methods of distortion, participants were still able to identify emotions above the level of chance, suggesting high sensitivity to emotional cues in a social context. In the final set of studies, the stimulus set was used in a multimodal context to examine the perception of emotion from movement and voice in dyadic social interactions. It was repeatedly found that voice dominated body movement as a cue to emotions when observing social interactions. Participants were less accurate and slower in emotion discrimination when they were making judgements from body movement only, compared to conditions when movement was combined with dialogue or when dialogue was presented on its own. Even when participants watched emotionally mismatched displays with combined movement and voice, they predominantly oriented their responses towards the voice rather than movement. This auditory dominance persisted even when the reliability of the auditory signal was degraded with brown noise or low-pass filtering, although visual information had some effect on judgements of emotion when it was combined with a degraded auditory signal. These results suggest that when judging emotions from observed social interactions, we rely primarily on vocal cues from conversation rather than visual cues from body movement

    Similar works