23,321 research outputs found

    MetaSpace II: Object and full-body tracking for interaction and navigation in social VR

    Full text link
    MetaSpace II (MS2) is a social Virtual Reality (VR) system where multiple users can not only see and hear but also interact with each other, grasp and manipulate objects, walk around in space, and get tactile feedback. MS2 allows walking in physical space by tracking each user's skeleton in real-time and allows users to feel by employing passive haptics i.e., when users touch or manipulate an object in the virtual world, they simultaneously also touch or manipulate a corresponding object in the physical world. To enable these elements in VR, MS2 creates a correspondence in spatial layout and object placement by building the virtual world on top of a 3D scan of the real world. Through the association between the real and virtual world, users are able to walk freely while wearing a head-mounted device, avoid obstacles like walls and furniture, and interact with people and objects. Most current virtual reality (VR) environments are designed for a single user experience where interactions with virtual objects are mediated by hand-held input devices or hand gestures. Additionally, users are only shown a representation of their hands in VR floating in front of the camera as seen from a first person perspective. We believe, representing each user as a full-body avatar that is controlled by natural movements of the person in the real world (see Figure 1d), can greatly enhance believability and a user's sense immersion in VR.Comment: 10 pages, 9 figures. Video: http://living.media.mit.edu/projects/metaspace-ii

    Detecting and tracking multiple interacting objects without class-specific models

    Get PDF
    We propose a framework for detecting and tracking multiple interacting objects from a single, static, uncalibrated camera. The number of objects is variable and unknown, and object-class-specific models are not available. We use background subtraction results as measurements for object detection and tracking. Given these constraints, the main challenge is to associate pixel measurements with (possibly interacting) object targets. We first track clusters of pixels, and note when they merge or split. We then build an inference graph, representing relations between the tracked clusters. Using this graph and a generic object model based on spatial connectedness and coherent motion, we label the tracked clusters as whole objects, fragments of objects or groups of interacting objects. The outputs of our algorithm are entire tracks of objects, which may include corresponding tracks from groups of objects during interactions. Experimental results on multiple video sequences are shown

    SAVASA project @ TRECVID 2012: interactive surveillance event detection

    Get PDF
    In this paper we describe our participation in the interactive surveillance event detection task at TRECVid 2012. The system we developed was comprised of individual classifiers brought together behind a simple video search interface that enabled users to select relevant segments based on down~sampled animated gifs. Two types of user -- `experts' and `end users' -- performed the evaluations. Due to time constraints we focussed on three events -- ObjectPut, PersonRuns and Pointing -- and two of the five available cameras (1 and 3). Results from the interactive runs as well as discussion of the performance of the underlying retrospective classifiers are presented

    Tracking of Individuals in Very Long Video Sequences

    Get PDF

    3D Object Reconstruction from Hand-Object Interactions

    Full text link
    Recent advances have enabled 3d object reconstruction approaches using a single off-the-shelf RGB-D camera. Although these approaches are successful for a wide range of object classes, they rely on stable and distinctive geometric or texture features. Many objects like mechanical parts, toys, household or decorative articles, however, are textureless and characterized by minimalistic shapes that are simple and symmetric. Existing in-hand scanning systems and 3d reconstruction techniques fail for such symmetric objects in the absence of highly distinctive features. In this work, we show that extracting 3d hand motion for in-hand scanning effectively facilitates the reconstruction of even featureless and highly symmetric objects and we present an approach that fuses the rich additional information of hands into a 3d reconstruction pipeline, significantly contributing to the state-of-the-art of in-hand scanning.Comment: International Conference on Computer Vision (ICCV) 2015, http://files.is.tue.mpg.de/dtzionas/In-Hand-Scannin

    Learning to Refine Human Pose Estimation

    Full text link
    Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model "hard" human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.Comment: To appear in CVPRW (2018). Workshop: Visual Understanding of Humans in Crowd Scene and the 2nd Look Into Person Challenge (VUHCS-LIP

    Automatic Video-based Analysis of Human Motion

    Get PDF
    corecore