222 research outputs found
Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories
In this paper, we propose a new approach for facial expression recognition
using deep covariance descriptors. The solution is based on the idea of
encoding local and global Deep Convolutional Neural Network (DCNN) features
extracted from still images, in compact local and global covariance
descriptors. The space geometry of the covariance matrices is that of Symmetric
Positive Definite (SPD) matrices. By conducting the classification of static
facial expressions using Support Vector Machine (SVM) with a valid Gaussian
kernel on the SPD manifold, we show that deep covariance descriptors are more
effective than the standard classification with fully connected layers and
softmax. Besides, we propose a completely new and original solution to model
the temporal dynamic of facial expressions as deep trajectories on the SPD
manifold. As an extension of the classification pipeline of covariance
descriptors, we apply SVM with valid positive definite kernels derived from
global alignment for deep covariance trajectories classification. By performing
extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that
both the proposed static and dynamic approaches achieve state-of-the-art
performance for facial expression recognition outperforming many recent
approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A,
Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial
Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018,
Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159."
arXiv admin note: substantial text overlap with arXiv:1805.0386
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
The task of open-vocabulary object-centric image retrieval involves the
retrieval of images containing a specified object of interest, delineated by an
open-set text query. As working on large image datasets becomes standard,
solving this task efficiently has gained significant practical importance.
Applications include targeted performance analysis of retrieved images using
ad-hoc queries and hard example mining during training. Recent advancements in
contrastive-based open vocabulary systems have yielded remarkable
breakthroughs, facilitating large-scale open vocabulary image retrieval.
However, these approaches use a single global embedding per image, thereby
constraining the system's ability to retrieve images containing relatively
small object instances. Alternatively, incorporating local embeddings from
detection pipelines faces scalability challenges, making it unsuitable for
retrieval from large databases.
In this work, we present a simple yet effective approach to object-centric
open-vocabulary image retrieval. Our approach aggregates dense embeddings
extracted from CLIP into a compact representation, essentially combining the
scalability of image retrieval pipelines with the object identification
capabilities of dense detection methods. We show the effectiveness of our
scheme to the task by achieving significantly better results than global
feature approaches on three datasets, increasing accuracy by up to 15 mAP
points. We further integrate our scheme into a large scale retrieval framework
and demonstrate our method's advantages in terms of scalability and
interpretability.Comment: BMVC 202
RELLISUR: A Real Low-Light Image Super-Resolution Dataset
The RELLISUR dataset contains real low-light low-resolution images paired with normal-light high-resolution reference image counterparts. This dataset aims to fill the gap between low-light image enhancement and low-resolution image enhancement (Super-Resolution (SR)) which is currently only being addressed separately in the literature, even though the visibility of real-world images is often limited by both low-light and low-resolution. The dataset contains 12750 paired images of different resolutions and degrees of low-light illumination, to facilitate learning of deep-learning based models that can perform a direct mapping from degraded images with low visibility to high-quality detail rich images of high resolution
Emergence of Object Segmentation in Perturbed Generative Models
We introduce a novel framework to build a model that can learn how to segment
objects from a collection of images without any human annotation. Our method
builds on the observation that the location of object segments can be perturbed
locally relative to a given background without affecting the realism of a
scene. Our approach is to first train a generative model of a layered scene.
The layered representation consists of a background image, a foreground image
and the mask of the foreground. A composite image is then obtained by
overlaying the masked foreground image onto the background. The generative
model is trained in an adversarial fashion against a discriminator, which
forces the generative model to produce realistic composite images. To force the
generator to learn a representation where the foreground layer corresponds to
an object, we perturb the output of the generative model by introducing a
random shift of both the foreground image and mask relative to the background.
Because the generator is unaware of the shift before computing its output, it
must produce layered representations that are realistic for any such random
perturbation. Finally, we learn to segment an image by defining an autoencoder
consisting of an encoder, which we train, and the pre-trained generator as the
decoder, which we freeze. The encoder maps an image to a feature vector, which
is fed as input to the generator to give a composite image matching the
original input image. Because the generator outputs an explicit layered
representation of the scene, the encoder learns to detect and segment objects.
We demonstrate this framework on real images of several object categories.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Spotlight presentatio
ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning
The state of the art in semantic segmentation is steadily increasing in
performance, resulting in more precise and reliable segmentations in many
different applications. However, progress is limited by the cost of generating
labels for training, which sometimes requires hours of manual labor for a
single image. Because of this, semi-supervised methods have been applied to
this task, with varying degrees of success. A key challenge is that common
augmentations used in semi-supervised classification are less effective for
semantic segmentation. We propose a novel data augmentation mechanism called
ClassMix, which generates augmentations by mixing unlabelled samples, by
leveraging on the network's predictions for respecting object boundaries. We
evaluate this augmentation technique on two common semi-supervised semantic
segmentation benchmarks, showing that it attains state-of-the-art results.
Lastly, we also provide extensive ablation studies comparing different design
decisions and training regimes.Comment: This paper has been accepted to WACV202
- …