1,224 research outputs found
Video-to-Video Pose and Expression Invariant Face Recognition using Volumetric Directional Pattern
Face recognition in video has attracted attention as a cryptic method of human identification in surveillance systems. In this paper, we propose an end-to-end video face recognition system, addressing a difficult problem of identifying human faces in video due to the presence of large variations in facial pose and expression, and poor video resolution. The proposed descriptor, named Volumetric Directional Pattern (VDP), is an oriented and multi-scale volumetric descriptor that is able to extract and fuse the information of multi frames, temporal (dynamic) information, and multiple poses and expressions of faces in input video to produce feature vectors, which are used to match with all the videos in the database. To make the approach computationally simple and easy to extend, key-frame extraction method is employed.
Therefore, only the frames which contain important information of the video can be used for further processing instead of analyzing all the frames in the video. The performance evaluation of the proposed VDP algorithm is conducted on a publicly available database (YouTube celebrities’ dataset) and observed promising recognition rates
High Order Volumetric Directional Pattern for Video-Based Face Recognition
Describing the dynamic textures has attracted growing attention in the field of computer vision and pattern recognition. In this paper, a novel approach for recognizing dynamic textures, namely, high order volumetric directional pattern (HOVDP), is proposed. It is an extension of the volumetric directional pattern (VDP) which extracts and fuses the temporal information (dynamic features) from three consecutive frames. HOVDP combines the movement and appearance features together considering the nth order volumetric directional variation patterns of all neighboring pixels from three consecutive frames. In experiments with two challenging video face databases, YouTube Celebrities and Honda/UCSD, HOVDP clearly outperformed a set of state-of-the-art approaches
Modeling geometric-temporal context with directional pyramid co-occurrence for action recognition
In this paper, we present a new geometric-temporal representation for visual action recognition based on local spatio-temporal features. First, we propose a modified covariance descriptor under the log-Euclidean Riemannian metric to represent the spatio-temporal cuboids detected in the video sequences. Compared with previously proposed covariance descriptors, our descriptor can be measured and clustered in Euclidian space. Second, to capture the geometric-temporal contextual information, we construct a directional pyramid co-occurrence matrix (DPCM) to describe the spatio-temporal distribution of the vector-quantized local feature descriptors extracted from a video. DPCM characterizes the co-occurrence statistics of local features as well as the spatio-temporal positional relationships among the concurrent features. These statistics provide strong descriptive power for action recognition. To use DPCM for action recognition, we propose a directional pyramid co-occurrence matching kernel to measure the similarity of videos. The proposed method achieves the state-of-the-art performance and improves on the recognition performance of the bag-of-visual-words (BOVWs) models by a large margin on six public data sets. For example, on the KTH data set, it achieves 98.78% accuracy while the BOVW approach only achieves 88.06%. On both Weizmann and UCF CIL data sets, the highest possible accuracy of 100% is achieved
Sparsity in Dynamics of Spontaneous Subtle Emotions: Analysis \& Application
Spontaneous subtle emotions are expressed through micro-expressions, which
are tiny, sudden and short-lived dynamics of facial muscles; thus poses a great
challenge for visual recognition. The abrupt but significant dynamics for the
recognition task are temporally sparse while the rest, irrelevant dynamics, are
temporally redundant. In this work, we analyze and enforce sparsity constrains
to learn significant temporal and spectral structures while eliminate
irrelevant facial dynamics of micro-expressions, which would ease the challenge
in the visual recognition of spontaneous subtle emotions. The hypothesis is
confirmed through experimental results of automatic spontaneous subtle emotion
recognition with several sparsity levels on CASME II and SMIC, the only two
publicly available spontaneous subtle emotion databases. The overall
performances of the automatic subtle emotion recognition are boosted when only
significant dynamics are preserved from the original sequences.Comment: IEEE Transaction of Affective Computing (2016
MVF-Net: Multi-View 3D Face Morphable Model Regression
We address the problem of recovering the 3D geometry of a human face from a
set of facial images in multiple views. While recent studies have shown
impressive progress in 3D Morphable Model (3DMM) based facial reconstruction,
the settings are mostly restricted to a single view. There is an inherent
drawback in the single-view setting: the lack of reliable 3D constraints can
cause unresolvable ambiguities. We in this paper explore 3DMM-based shape
recovery in a different setting, where a set of multi-view facial images are
given as input. A novel approach is proposed to regress 3DMM parameters from
multi-view inputs with an end-to-end trainable Convolutional Neural Network
(CNN). Multiview geometric constraints are incorporated into the network by
establishing dense correspondences between different views leveraging a novel
self-supervised view alignment loss. The main ingredient of the view alignment
loss is a differentiable dense optical flow estimator that can backpropagate
the alignment errors between an input view and a synthetic rendering from
another input view, which is projected to the target view through the 3D shape
to be inferred. Through minimizing the view alignment loss, better 3D shapes
can be recovered such that the synthetic projections from one view to another
can better align with the observed image. Extensive experiments demonstrate the
superiority of the proposed method over other 3DMM methods.Comment: 2019 Conference on Computer Vision and Pattern Recognitio
- …