39,617 research outputs found
On using gait to enhance frontal face extraction
Visual surveillance finds increasing deployment formonitoring urban environments. Operators need to be able to determine identity from surveillance images and often use face recognition for this purpose. In surveillance environments, it is necessary to handle pose variation of the human head, low frame rate, and low resolution input images. We describe the first use of gait to enable face acquisition and recognition, by analysis of 3-D head motion and gait trajectory, with super-resolution analysis. We use region- and distance-based refinement of head pose estimation. We develop a direct mapping to relate the 2-D image with a 3-D model. In gait trajectory analysis, we model the looming effect so as to obtain the correct face region. Based on head position and the gait trajectory, we can reconstruct high-quality frontal face images which are demonstrated to be suitable for face recognition. The contributions of this research include the construction of a 3-D model for pose estimation from planar imagery and the first use of gait information to enhance the face extraction process allowing for deployment in surveillance scenario
Intrinsic Dynamic Shape Prior for Fast, Sequential and Dense Non-Rigid Structure from Motion with Detection of Temporally-Disjoint Rigidity
While dense non-rigid structure from motion (NRSfM) has been extensively studied from the perspective of the reconstructability problem over the recent years, almost no attempts have been undertaken to bring it into the practical realm. The reasons for the slow dissemination are the severe ill-posedness, high sensitivity to motion and deformation cues and the difficulty to obtain reliable point tracks in the vast majority of practical scenarios. To fill this gap, we propose a hybrid approach that extracts prior shape knowledge from an input sequence with NRSfM and uses it as a dynamic shape prior for sequential surface recovery in scenarios with recurrence. Our Dynamic Shape Prior Reconstruction (DSPR) method can be combined with existing dense NRSfM techniques while its energy functional is optimised with stochastic gradient descent at real-time rates for new incoming point tracks. The proposed versatile framework with a new core NRSfM approach outperforms several other methods in the ability to handle inaccurate and noisy point tracks, provided we have access to a representative (in terms of the deformation variety) image sequence. Comprehensive experiments highlight convergence properties and the accuracy of DSPR under different disturbing effects. We also perform a joint study of tracking and reconstruction and show applications to shape compression and heart reconstruction under occlusions. We achieve state-of-the-art metrics (accuracy and compression ratios) in different scenarios
CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images
With the powerfulness of convolution neural networks (CNN), CNN based face
reconstruction has recently shown promising performance in reconstructing
detailed face shape from 2D face images. The success of CNN-based methods
relies on a large number of labeled data. The state-of-the-art synthesizes such
data using a coarse morphable face model, which however has difficulty to
generate detailed photo-realistic images of faces (with wrinkles). This paper
presents a novel face data generation method. Specifically, we render a large
number of photo-realistic face images with different attributes based on
inverse rendering. Furthermore, we construct a fine-detailed face image dataset
by transferring different scales of details from one image to another. We also
construct a large number of video-type adjacent frame pairs by simulating the
distribution of real video data. With these nicely constructed datasets, we
propose a coarse-to-fine learning framework consisting of three convolutional
networks. The networks are trained for real-time detailed 3D face
reconstruction from monocular video as well as from a single image. Extensive
experimental results demonstrate that our framework can produce high-quality
reconstruction but with much less computation time compared to the
state-of-the-art. Moreover, our method is robust to pose, expression and
lighting due to the diversity of data.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks
Automatic analysis of the video is one of most complex problems in the fields
of computer vision and machine learning. A significant part of this research
deals with (human) activity recognition (HAR) since humans, and the activities
that they perform, generate most of the video semantics. Video-based HAR has
applications in various domains, but one of the most important and challenging
is HAR in sports videos. Some of the major issues include high inter- and
intra-class variations, large class imbalance, the presence of both group
actions and single player actions, and recognizing simultaneous actions, i.e.,
the multi-label learning problem. Keeping in mind these challenges and the
recent success of CNNs in solving various computer vision problems, in this
work, we implement a 3D CNN based multi-label deep HAR system for multi-label
class-imbalanced action recognition in hockey videos. We test our system for
two different scenarios: an ensemble of binary networks vs. a single
-output network, on a publicly available dataset. We also compare our
results with the system that was originally designed for the chosen dataset.
Experimental results show that the proposed approach performs better than the
existing solution.Comment: Accepted to IEEE/ACIS SNPD 2018, 6 pages, 3 figure
- …