46 research outputs found
Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories
In this paper, we propose a new approach for facial expression recognition
using deep covariance descriptors. The solution is based on the idea of
encoding local and global Deep Convolutional Neural Network (DCNN) features
extracted from still images, in compact local and global covariance
descriptors. The space geometry of the covariance matrices is that of Symmetric
Positive Definite (SPD) matrices. By conducting the classification of static
facial expressions using Support Vector Machine (SVM) with a valid Gaussian
kernel on the SPD manifold, we show that deep covariance descriptors are more
effective than the standard classification with fully connected layers and
softmax. Besides, we propose a completely new and original solution to model
the temporal dynamic of facial expressions as deep trajectories on the SPD
manifold. As an extension of the classification pipeline of covariance
descriptors, we apply SVM with valid positive definite kernels derived from
global alignment for deep covariance trajectories classification. By performing
extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that
both the proposed static and dynamic approaches achieve state-of-the-art
performance for facial expression recognition outperforming many recent
approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A,
Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial
Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018,
Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159."
arXiv admin note: substantial text overlap with arXiv:1805.0386
Initial Classifier Weights Replay for Memoryless Class Incremental Learning
Incremental Learning (IL) is useful when artificial systems need to deal with
streams of data and do not have access to all data at all times. The most
challenging setting requires a constant complexity of the deep model and an
incremental model update without access to a bounded memory of past data. Then,
the representations of past classes are strongly affected by catastrophic
forgetting. To mitigate its negative effect, an adapted fine tuning which
includes knowledge distillation is usually deployed. We propose a different
approach based on a vanilla fine tuning backbone. It leverages initial
classifier weights which provide a strong representation of past classes
because they are trained with all class data. However, the magnitude of
classifiers learned in different states varies and normalization is needed for
a fair handling of all classes. Normalization is performed by standardizing the
initial classifier weights, which are assumed to be normally distributed. In
addition, a calibration of prediction scores is done by using state level
statistics to further improve classification fairness. We conduct a thorough
evaluation with four public datasets in a memoryless incremental learning
setting. Results show that our method outperforms existing techniques by a
large margin for large-scale datasets.Comment: Accepted in BMVC202
Emergence of Object Segmentation in Perturbed Generative Models
We introduce a novel framework to build a model that can learn how to segment
objects from a collection of images without any human annotation. Our method
builds on the observation that the location of object segments can be perturbed
locally relative to a given background without affecting the realism of a
scene. Our approach is to first train a generative model of a layered scene.
The layered representation consists of a background image, a foreground image
and the mask of the foreground. A composite image is then obtained by
overlaying the masked foreground image onto the background. The generative
model is trained in an adversarial fashion against a discriminator, which
forces the generative model to produce realistic composite images. To force the
generator to learn a representation where the foreground layer corresponds to
an object, we perturb the output of the generative model by introducing a
random shift of both the foreground image and mask relative to the background.
Because the generator is unaware of the shift before computing its output, it
must produce layered representations that are realistic for any such random
perturbation. Finally, we learn to segment an image by defining an autoencoder
consisting of an encoder, which we train, and the pre-trained generator as the
decoder, which we freeze. The encoder maps an image to a feature vector, which
is fed as input to the generator to give a composite image matching the
original input image. Because the generator outputs an explicit layered
representation of the scene, the encoder learns to detect and segment objects.
We demonstrate this framework on real images of several object categories.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Spotlight presentatio
Balanced Sparsity for Efficient DNN Inference on GPU
In trained deep neural networks, unstructured pruning can reduce redundant
weights to lower storage cost. However, it requires the customization of
hardwares to speed up practical inference. Another trend accelerates sparse
model inference on general-purpose hardwares by adopting coarse-grained
sparsity to prune or regularize consecutive weights for efficient computation.
But this method often sacrifices model accuracy. In this paper, we propose a
novel fine-grained sparsity approach, balanced sparsity, to achieve high model
accuracy with commercial hardwares efficiently. Our approach adapts to high
parallelism property of GPU, showing incredible potential for sparsity in the
widely deployment of deep learning services. Experiment results show that
balanced sparsity achieves up to 3.1x practical speedup for model inference on
GPU, while retains the same high model accuracy as fine-grained sparsity
Gram Matrices Formulation of Body Shape Motion: An Application for Depression Severity Assessment
International audienceWe propose an automatic method to measure depression severity from body movement dynamics in participants undergoing treatment for depression. Participants were recorded in clinical interviews (Hamilton Rating Scale for Depression, HRSD) at seven-week intervals over a period of 21 weeks. Gram matrices formulation was used for body shape and trajectories representation from each video interview. Kinematic features were then extracted and encoded for video based representation using Gaussian Mixture Models (GMM) and Fisher vector encoding. A multi-class SVM was finally used to classify the encoded body movement dynamics into three levels of depression severity scores: moderate to severely depressed, mildly depressed, and remitted. Accuracy was higher for moderate to severe depression (68%) followed by mild depression (56%), and then remitted (37.93%). The obtained results suggest that automatic detection of depression severity from body movement is feasible