23,321 research outputs found
MetaSpace II: Object and full-body tracking for interaction and navigation in social VR
MetaSpace II (MS2) is a social Virtual Reality (VR) system where multiple
users can not only see and hear but also interact with each other, grasp and
manipulate objects, walk around in space, and get tactile feedback. MS2 allows
walking in physical space by tracking each user's skeleton in real-time and
allows users to feel by employing passive haptics i.e., when users touch or
manipulate an object in the virtual world, they simultaneously also touch or
manipulate a corresponding object in the physical world. To enable these
elements in VR, MS2 creates a correspondence in spatial layout and object
placement by building the virtual world on top of a 3D scan of the real world.
Through the association between the real and virtual world, users are able to
walk freely while wearing a head-mounted device, avoid obstacles like walls and
furniture, and interact with people and objects. Most current virtual reality
(VR) environments are designed for a single user experience where interactions
with virtual objects are mediated by hand-held input devices or hand gestures.
Additionally, users are only shown a representation of their hands in VR
floating in front of the camera as seen from a first person perspective. We
believe, representing each user as a full-body avatar that is controlled by
natural movements of the person in the real world (see Figure 1d), can greatly
enhance believability and a user's sense immersion in VR.Comment: 10 pages, 9 figures. Video:
http://living.media.mit.edu/projects/metaspace-ii
Detecting and tracking multiple interacting objects without class-specific models
We propose a framework for detecting and tracking multiple interacting objects from a single, static, uncalibrated camera. The number of objects is variable and unknown, and object-class-specific models are not available. We use background subtraction results as measurements for object detection and tracking. Given these constraints, the main challenge is to associate pixel measurements with (possibly interacting) object targets. We first track clusters of pixels, and note when they merge or split. We then build an inference graph, representing relations between the tracked clusters. Using this graph and a generic object model based on spatial connectedness and coherent motion, we label the tracked clusters as whole objects, fragments of objects or groups of interacting objects. The outputs of our algorithm are entire tracks of objects, which may include corresponding tracks from groups of objects during interactions. Experimental results on multiple video sequences are shown
SAVASA project @ TRECVID 2012: interactive surveillance event detection
In this paper we describe our participation in the interactive surveillance event detection task at TRECVid 2012. The system we developed was comprised of individual classifiers brought together behind a simple video search interface that enabled users to select relevant segments based on down~sampled animated gifs. Two types of user -- `experts' and `end users' -- performed the evaluations. Due to time constraints we focussed on three events -- ObjectPut, PersonRuns and Pointing -- and two of the five available cameras (1 and 3). Results from the interactive runs as well as discussion of the performance of the underlying retrospective classifiers are presented
3D Object Reconstruction from Hand-Object Interactions
Recent advances have enabled 3d object reconstruction approaches using a
single off-the-shelf RGB-D camera. Although these approaches are successful for
a wide range of object classes, they rely on stable and distinctive geometric
or texture features. Many objects like mechanical parts, toys, household or
decorative articles, however, are textureless and characterized by minimalistic
shapes that are simple and symmetric. Existing in-hand scanning systems and 3d
reconstruction techniques fail for such symmetric objects in the absence of
highly distinctive features. In this work, we show that extracting 3d hand
motion for in-hand scanning effectively facilitates the reconstruction of even
featureless and highly symmetric objects and we present an approach that fuses
the rich additional information of hands into a 3d reconstruction pipeline,
significantly contributing to the state-of-the-art of in-hand scanning.Comment: International Conference on Computer Vision (ICCV) 2015,
http://files.is.tue.mpg.de/dtzionas/In-Hand-Scannin
Learning to Refine Human Pose Estimation
Multi-person pose estimation in images and videos is an important yet
challenging task with many applications. Despite the large improvements in
human pose estimation enabled by the development of convolutional neural
networks, there still exist a lot of difficult cases where even the
state-of-the-art models fail to correctly localize all body joints. This
motivates the need for an additional refinement step that addresses these
challenging cases and can be easily applied on top of any existing method. In
this work, we introduce a pose refinement network (PoseRefiner) which takes as
input both the image and a given pose estimate and learns to directly predict a
refined pose by jointly reasoning about the input-output space. In order for
the network to learn to refine incorrect body joint predictions, we employ a
novel data augmentation scheme for training, where we model "hard" human pose
cases. We evaluate our approach on four popular large-scale pose estimation
benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack
Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement
over the state of the art.Comment: To appear in CVPRW (2018). Workshop: Visual Understanding of Humans
in Crowd Scene and the 2nd Look Into Person Challenge (VUHCS-LIP
- …