2,749 research outputs found
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
LOMo: Latent Ordinal Model for Facial Analysis in Videos
We study the problem of facial analysis in videos. We propose a novel weakly
supervised learning method that models the video event (expression, pain etc.)
as a sequence of automatically mined, discriminative sub-events (eg. onset and
offset phase for smile, brow lower and cheek raise for pain). The proposed
model is inspired by the recent works on Multiple Instance Learning and latent
SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in
the videos, approximately. We obtain consistent improvements over relevant
competitive baselines on four challenging and publicly available video based
facial analysis datasets for prediction of expression, clinical pain and intent
in dyadic conversations. In combination with complimentary features, we report
state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
- …