22 research outputs found

    Social Perception of Pedestrians and Virtual Agents Using Movement Features

    Get PDF
    In many tasks such as navigation in a shared space, humans explicitly or implicitly estimate social information related to the emotions, dominance, and friendliness of other humans around them. This social perception is critical in predicting others’ motions or actions and deciding how to interact with them. Therefore, modeling social perception is an important problem for robotics, autonomous vehicle navigation, and VR and AR applications. In this thesis, we present novel, data-driven models for the social perception of pedestrians and virtual agents based on their movement cues, including gaits, gestures, gazing, and trajectories. We use deep learning techniques (e.g., LSTMs) along with biomechanics to compute the gait features and combine them with local motion models to compute the trajectory features. Furthermore, we compute the gesture and gaze representations using psychological characteristics. We describe novel mappings between these computed gaits, gestures, gazing, and trajectory features and the various components (emotions, dominance, friendliness, approachability, and deception) of social perception. Our resulting data-driven models can identify the dominance, deception, and emotion of pedestrians from videos with an accuracy of more than 80%. We also release new datasets to evaluate these methods. We apply our data-driven models to socially-aware robot navigation and the navigation of autonomous vehicles among pedestrians. Our method generates robot movement based on pedestrians’ dominance levels, resulting in higher rapport and comfort. We also apply our data-driven models to simulate virtual agents with desired emotions, dominance, and friendliness. We perform user studies and show that our data-driven models significantly increase the user’s sense of social presence in VR and AR environments compared to the baseline methods.Doctor of Philosoph

    LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos

    Full text link
    Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video. It can be thought of as a specialized version of Visual Relationship Detection, wherein one of the objects must be a human. While traditional methods formulate the problem as inference on a sequence of video segments, we present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture spatio-temporal cues at multiple granularities in a video. Unlike current approaches, LIGHTEN avoids using ground truth data like depth maps or 3D human pose, thus increasing generalization across non-RGBD datasets as well. Furthermore, we achieve the same using only the visual features, instead of the commonly used hand-crafted spatial features. We achieve state-of-the-art results in human-object interaction detection (88.9% and 92.6%) and anticipation tasks of CAD-120 and competitive results on image based HOI detection in V-COCO dataset, setting a new benchmark for visual features based approaches. Code for LIGHTEN is available at https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOIComment: 9 pages, 6 figures, ACM Multimedia Conference 202
    corecore