2 research outputs found
Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce an end-to-end framework for video analysis
focused towards practical scenarios built on theoretical foundations from
sparse representation, including a novel descriptor for general purpose video
analysis. In our approach, we compute kinematic features from optical flow and
first and second-order derivatives of intensities to represent motion and
appearance respectively. These features are then used to construct covariance
matrices which capture joint statistics of both low-level motion and appearance
features extracted from a video. Using an over-complete dictionary of the
covariance based descriptors built from labeled training samples, we formulate
low-level event recognition as a sparse linear approximation problem. Within
this, we pose the sparse decomposition of a covariance matrix, which also
conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear
Riemannian manifolds, we compare our former approach with a sparse linear
approximation alternative that is suitable for equivalent vector spaces of
covariance matrices. This is done by searching for the best projection of the
query data on a dictionary using an Orthogonal Matching pursuit algorithm. We
show the applicability of our video descriptor in two different application
domains - namely low-level event recognition in unconstrained scenarios and
gesture recognition using one shot learning. Our experiments provide promising
insights in large scale video analysis
Deep Covariance Descriptors for Facial Expression Recognition
In this paper, covariance matrices are exploited to encode the deep
convolutional neural networks (DCNN) features for facial expression
recognition. The space geometry of the covariance matrices is that of Symmetric
Positive Definite (SPD) matrices. By performing the classification of the
facial expressions using Gaussian kernel on SPD manifold, we show that the
covariance descriptors computed on DCNN features are more efficient than the
standard classification with fully connected layers and softmax. By
implementing our approach using the VGG-face and ExpNet architectures with
extensive experiments on the Oulu-CASIA and SFEW datasets, we show that the
proposed approach achieves performance at the state of the art for facial
expression recognition