Social Interactions in Immersive Virtual Environments: People, Agents, and Avatars

Abstract

Immersive virtual environments (IVEs) have received increased popularity with applications in many fields. IVEs aim to approximate real environments, and to make users react similarly to how they would in everyday life. An important use case is the users-virtual characters (VCs) interaction. We interact with other people every day, hence we expect others to appropriately act and behave, verbally and non-verbally (i.e., pitch, proximity, gaze, turn-taking). These expectations also apply to interactions with VCs in IVEs, and this thesis tackles some of these aspects. We present three projects that inform the area of social interactions with a VC in IVEs, focusing on non-verbal behaviours. In our first study on interactions between people, we collaborated with the Social Neuroscience group at the Institute of Cognitive Neuroscience from UCL on a dyad multi-modal interaction. This aims to understand the conversation dynamics, focusing on gaze and turn-taking. The results show that people have a higher frequency of gaze change (from averted to direct and vice versa) when they are being looked at compared to when they are not. When they are not being looked at, they are also directing their gaze to their partners more compared to when they are being looked at. Another contribution of this work is the automated method of annotating speech and gaze data. Next, we consider agents’ higher-level non-verbal behaviours, covering social attitudes. We present a pipeline to collect data and train a machine learning (ML) model that detects social attitudes in a user-VC interaction. Here we collaborated with two game studios: Dream Reality Interaction and Maze Theory. We present a case study for the ML pipeline on social engagement recognition for the Peaky Blinders narrative VR game from Maze Theory studio. We use a reinforcement learning algorithm with imitation learning rewards and a temporal memory element. The results show that the model trained with raw data does not generalise and performs worse (60% accuracy) than the one trained with socially meaningful data (83% accuracy). In IVEs, people embody avatars and their appearance can impact social interactions. In collaboration with Microsoft Research, we report a longitudinal study in mixed-reality on avatar appearance in real-work meetings between co-workers comparing personalised full-body realistic and cartoon avatars. The results imply that when participants use realistic avatars first, they may have higher expectations and they perceive their colleagues’ emotional states with less accuracy. Participants may also become more accustomed to cartoon avatars as time passes and the overall use of avatars may lead to less accurately perceiving negative emotions. The work presented here contributes towards the field of detecting and generating nonverbal cues for VCs in IVEs. These are also important building blocks for creating autonomous agents for IVEs. Additionally, this work contributes to the games and work industry fields through an immersive ML pipeline for detecting social attitudes and through insights into using different avatar styles over time in real-world meetings

    Similar works