86,844 research outputs found
The Importance of Hand Motions for Communication and Interaction in Virtual Reality
Virtual reality (VR) is a growing method of communication and play. Recent advances have enabled hand-tracking technologies for consumer VR headsets, allowing virtual hands to mimic a user\u27s real hand movements in real-time. A growing number of users now utilize hand-tracking when using VR to manipulate objects or to create gestures when interacting with others. As VR grows as a tool and communication platform, it is important to understand how the rising prevalence of hand-tracking technology might affect users\u27 experiences.
The goal of this dissertation is to investigate, through a series of experiments, how using hand motions in VR influences our experience when we communicate with others or interact with the environment. In our daily lives hand motions play a major role in interpersonal communication. Our hands can help emphasize or clarify our speech, or even supplement words entirely. When interacting with the world, hands are our primary tool for manipulating objects and performing dexterous tasks. Bringing these capabilities into VR, a space that has so far been lacking in such detailed expression and interaction, may have unexpected effects.
Overall, we show that using hand-tracking and hand motions in VR is beneficial to many metrics that are used to measure the quality of experiences in virtual environments. When using accurate hand motions, people feel more comfortable and embodied within their virtual avatars, or they feel more socially present. We recommend tracking and displaying hand motions in virtual environments if embodiment or communication are the most important criteria
Overview of Bayesian sequential Monte Carlo methods for group and extended object tracking
This work presents the current state-of-the-art in techniques for tracking a number of objects moving in a coordinated and interacting fashion. Groups are structured objects characterized with particular motion patterns. The group can be comprised of a small number of interacting objects (e.g. pedestrians, sport players, convoy of cars) or of hundreds or thousands of components such as crowds of people. The group object tracking is closely linked with extended object tracking but at the same time has particular features which differentiate it from extended objects. Extended objects, such as in maritime surveillance, are characterized by their kinematic states and their size or volume. Both group and extended objects give rise to a varying number of measurements and require trajectory maintenance. An emphasis is given here to sequential Monte Carlo (SMC) methods and their variants. Methods for small groups and for large groups are presented, including Markov Chain Monte Carlo (MCMC) methods, the random matrices approach and Random Finite Set Statistics methods. Efficient real-time implementations are discussed which are able to deal with the high dimensionality and provide high accuracy. Future trends and avenues are traced. © 2013 Elsevier Inc. All rights reserved
A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects
Tracking humans that are interacting with the other subjects or environment
remains unsolved in visual tracking, because the visibility of the human of
interests in videos is unknown and might vary over time. In particular, it is
still difficult for state-of-the-art human trackers to recover complete human
trajectories in crowded scenes with frequent human interactions. In this work,
we consider the visibility status of a subject as a fluent variable, whose
change is mostly attributed to the subject's interaction with the surrounding,
e.g., crossing behind another object, entering a building, or getting into a
vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the
causal-effect relations between an object's visibility fluent and its
activities, and develop a probabilistic graph model to jointly reason the
visibility fluent change (e.g., from visible to invisible) and track humans in
videos. We formulate this joint task as an iterative search of a feasible
causal graph structure that enables fast search algorithm, e.g., dynamic
programming method. We apply the proposed method on challenging video sequences
to evaluate its capabilities of estimating visibility fluent changes of
subjects and tracking subjects of interests over time. Results with comparisons
demonstrate that our method outperforms the alternative trackers and can
recover complete trajectories of humans in complicated scenarios with frequent
human interactions.Comment: accepted by CVPR 201
Multi-Object Tracking with Interacting Vehicles and Road Map Information
In many applications, tracking of multiple objects is crucial for a
perception of the current environment. Most of the present multi-object
tracking algorithms assume that objects move independently regarding other
dynamic objects as well as the static environment. Since in many traffic
situations objects interact with each other and in addition there are
restrictions due to drivable areas, the assumption of an independent object
motion is not fulfilled. This paper proposes an approach adapting a
multi-object tracking system to model interaction between vehicles, and the
current road geometry. Therefore, the prediction step of a Labeled
Multi-Bernoulli filter is extended to facilitate modeling interaction between
objects using the Intelligent Driver Model. Furthermore, to consider road map
information, an approximation of a highly precise road map is used. The results
show that in scenarios where the assumption of a standard motion model is
violated, the tracking system adapted with the proposed method achieves higher
accuracy and robustness in its track estimations
MetaSpace II: Object and full-body tracking for interaction and navigation in social VR
MetaSpace II (MS2) is a social Virtual Reality (VR) system where multiple
users can not only see and hear but also interact with each other, grasp and
manipulate objects, walk around in space, and get tactile feedback. MS2 allows
walking in physical space by tracking each user's skeleton in real-time and
allows users to feel by employing passive haptics i.e., when users touch or
manipulate an object in the virtual world, they simultaneously also touch or
manipulate a corresponding object in the physical world. To enable these
elements in VR, MS2 creates a correspondence in spatial layout and object
placement by building the virtual world on top of a 3D scan of the real world.
Through the association between the real and virtual world, users are able to
walk freely while wearing a head-mounted device, avoid obstacles like walls and
furniture, and interact with people and objects. Most current virtual reality
(VR) environments are designed for a single user experience where interactions
with virtual objects are mediated by hand-held input devices or hand gestures.
Additionally, users are only shown a representation of their hands in VR
floating in front of the camera as seen from a first person perspective. We
believe, representing each user as a full-body avatar that is controlled by
natural movements of the person in the real world (see Figure 1d), can greatly
enhance believability and a user's sense immersion in VR.Comment: 10 pages, 9 figures. Video:
http://living.media.mit.edu/projects/metaspace-ii
3D Tracking Using Multi-view Based Particle Filters
Visual surveillance and monitoring of indoor environments using multiple cameras has become a field of great activity in computer vision. Usual 3D tracking and positioning systems rely on several independent 2D tracking modules applied over individual camera streams, fused using geometrical relationships across cameras. As 2D tracking systems suffer inherent difficulties due to point of view limitations (perceptually similar foreground and background regions causing fragmentation of moving objects, occlusions), 3D tracking based on partially erroneous 2D tracks are likely to fail when handling multiple-people interaction. To overcome this problem, this paper proposes a Bayesian framework for combining 2D low-level cues from multiple cameras directly into the 3D world through 3D Particle Filters. This method allows to estimate the probability of a certain volume being occupied by a moving object, and thus to segment and track multiple people across the monitored area. The proposed method is developed on the basis of simple, binary 2D moving region segmentation on each camera, considered as different state observations. In addition, the method is proved well suited for integrating additional 2D low-level cues to increase system robustness to occlusions: in this line, a naĂŻve color-based (HSI) appearance model has been integrated, resulting in clear performance improvements when dealing with complex scenarios
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Saying What You're Looking For: Linguistics Meets Video Search
We present an approach to searching large video corpora for video clips which
depict a natural-language query in the form of a sentence. This approach uses
compositional semantics to encode subtle meaning that is lost in other systems,
such as the difference between two sentences which have identical words but
entirely different meaning: "The person rode the horse} vs. \emph{The horse
rode the person". Given a video-sentence pair and a natural-language parser,
along with a grammar that describes the space of sentential queries, we produce
a score which indicates how well the video depicts the sentence. We produce
such a score for each video clip in a corpus and return a ranked list of clips.
Furthermore, this approach addresses two fundamental problems simultaneously:
detecting and tracking objects, and recognizing whether those tracks depict the
query. Because both tracking and object detection are unreliable, this uses
knowledge about the intended sentential query to focus the tracker on the
relevant participants and ensures that the resulting tracks are described by
the sentential query. While earlier work was limited to single-word queries
which correspond to either verbs or nouns, we show how one can search for
complex queries which contain multiple phrases, such as prepositional phrases,
and modifiers, such as adverbs. We demonstrate this approach by searching for
141 queries involving people and horses interacting with each other in 10
full-length Hollywood movies.Comment: 13 pages, 8 figure
A Dose of Reality: Overcoming Usability Challenges in VR Head-Mounted Displays
We identify usability challenges facing consumers adopting Virtual Reality (VR) head-mounted displays (HMDs) in a survey of 108 VR HMD users. Users reported significant issues in interacting with, and being aware of their real-world context when using a HMD. Building upon existing work on blending real and virtual environments, we performed three design studies to address these usability concerns. In a typing study, we show that augmenting VR with a view of reality significantly corrected the performance impairment of
typing in VR. We then investigated how much reality should be incorporated and when, so as to preserve usersâ sense of presence in VR. For interaction with objects and peripherals, we found that selectively presenting reality as users engaged with it was optimal in terms of performance and usersâ sense of presence. Finally, we investigated how this selective, engagement-dependent approach could be applied in social environments, to support the userâs awareness of the proximity and presence of others
- âŠ