25,499 research outputs found
Planar Object Tracking in the Wild: A Benchmark
Planar object tracking is an actively studied problem in vision-based robotic
applications. While several benchmarks have been constructed for evaluating
state-of-the-art algorithms, there is a lack of video sequences captured in the
wild rather than in constrained laboratory environment. In this paper, we
present a carefully designed planar object tracking benchmark containing 210
videos of 30 planar objects sampled in the natural environment. In particular,
for each object, we shoot seven videos involving various challenging factors,
namely scale change, rotation, perspective distortion, motion blur, occlusion,
out-of-view, and unconstrained. The ground truth is carefully annotated
semi-manually to ensure the quality. Moreover, eleven state-of-the-art
algorithms are evaluated on the benchmark using two evaluation metrics, with
detailed analysis provided for the evaluation results. We expect the proposed
benchmark to benefit future studies on planar object tracking.Comment: Accepted by ICRA 201
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Constructing a gazebo: supporting teamwork in a tightly coupled, distributed task in virtual reality
Many tasks require teamwork. Team members may work concurrently, but there must be some occasions of coming together. Collaborative virtual environments (CVEs) allow distributed teams to come together across distance to share a task. Studies of CVE systems have tended to focus on the sense of presence or copresence with other people. They have avoided studying close interaction between us-ers, such as the shared manipulation of objects, because CVEs suffer from inherent network delays and often have cumbersome user interfaces. Little is known about the ef-fectiveness of collaboration in tasks requiring various forms of object sharing and, in particular, the concurrent manipu-lation of objects.
This paper investigates the effectiveness of supporting teamwork among a geographically distributed group in a task that requires the shared manipulation of objects. To complete the task, users must share objects through con-current manipulation of both the same and distinct at-tributes. The effectiveness of teamwork is measured in terms of time taken to achieve each step, as well as the impression of users. The effect of interface is examined by comparing various combinations of walk-in cubic immersive projection technology (IPT) displays and desktop devices
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
- …