13 research outputs found
Social Perception of Pedestrians and Virtual Agents Using Movement Features
In many tasks such as navigation in a shared space, humans explicitly or implicitly estimate social information related to the emotions, dominance, and friendliness of other humans around them. This social perception is critical in predicting others’ motions or actions and deciding how to interact with them. Therefore, modeling social perception is an important problem for robotics, autonomous vehicle navigation, and VR and AR applications. In this thesis, we present novel, data-driven models for the social perception of pedestrians and virtual agents based on their movement cues, including gaits, gestures, gazing, and trajectories. We use deep learning techniques (e.g., LSTMs) along with biomechanics to compute the gait features and combine them with local motion models to compute the trajectory features. Furthermore, we compute the gesture and gaze representations using psychological characteristics. We describe novel mappings between these computed gaits, gestures, gazing, and trajectory features and the various components (emotions, dominance, friendliness, approachability, and deception) of social perception. Our resulting data-driven models can identify the dominance, deception, and emotion of pedestrians from videos with an accuracy of more than 80%. We also release new datasets to evaluate these methods. We apply our data-driven models to socially-aware robot navigation and the navigation of autonomous vehicles among pedestrians. Our method generates robot movement based on pedestrians’ dominance levels, resulting in higher rapport and comfort. We also apply our data-driven models to simulate virtual agents with desired emotions, dominance, and friendliness. We perform user studies and show that our data-driven models significantly increase the user’s sense of social presence in VR and AR environments compared to the baseline methods.Doctor of Philosoph
LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos
Analyzing the interactions between humans and objects from a video includes
identification of the relationships between humans and the objects present in
the video. It can be thought of as a specialized version of Visual Relationship
Detection, wherein one of the objects must be a human. While traditional
methods formulate the problem as inference on a sequence of video segments, we
present a hierarchical approach, LIGHTEN, to learn visual features to
effectively capture spatio-temporal cues at multiple granularities in a video.
Unlike current approaches, LIGHTEN avoids using ground truth data like depth
maps or 3D human pose, thus increasing generalization across non-RGBD datasets
as well. Furthermore, we achieve the same using only the visual features,
instead of the commonly used hand-crafted spatial features. We achieve
state-of-the-art results in human-object interaction detection (88.9% and
92.6%) and anticipation tasks of CAD-120 and competitive results on image based
HOI detection in V-COCO dataset, setting a new benchmark for visual features
based approaches. Code for LIGHTEN is available at
https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOIComment: 9 pages, 6 figures, ACM Multimedia Conference 202