165 research outputs found
Cued Speech Gesture Recognition: A First Prototype Based on Early Reduction
International audienceCued Speech is a specific linguistic code for hearing-impaired people. It is based on both lip reading and manual gestures. In the context of THIMP (Telephony for the Hearing-IMpaired Project), we work on automatic cued speech translation. In this paper, we only address the problem of automatic cued speech manual gesture recognition. Such a gesture recognition issue is really common from a theoretical point of view, but we approach it with respect to its particularities in order to derive an original method. This method is essentially built around a bioinspired method called early reduction. Prior to a complete analysis of each image of a sequence, the early reduction process automatically extracts a restricted number of key images which summarize the whole sequence. Only the key images are studied from a temporal point of view with lighter computation than the complete sequenc
Low Level Features for Quality Assessment of Facial Images
International audienceAn automated system that provides feedback about aesthetic quality of facial pictures could be of great interest for editing or selecting photos. Although image aesthetic quality assessment is a challenging task that requires understanding of subjective notions, the proposed work shows that facial image quality can be estimated by using low-level features only. This paper provides a method that can predict aesthetic quality scores of facial images. 15 features that depict technical aspects of images such as contrast, sharpness or colorfulness are computed on different image regions (face, eyes, mouth) and a machine learning algorithm is used to perform classification and scoring. Relevant features and facial image areas are selected by a feature ranking technique, increasing both classification and regression performance. Results are compared with recent works, and it is shown that by using the proposed low-level feature set, the best state of the art results are obtained
Biologically Inspired Processing for Lighting Robust Face Recognition
ISBN 978-953-307-489-4, Hard cover, 314 pagesNo abstrac
Fully automated facial picture evaluation using high level attributes
International audiencePeople automatically and quickly judge a facial picture from its appearance. Thus, developing tools that can reproduce human judgments may help consumers in their picture selection process. Previous work mostly studied the position of facial keypoints to make predictions about specific traits: trustworthiness, likability, competence, etc. In this work, high level attributes (e.g. gender, age, smile) are automatically extracted using 3 different tools and are used to build models adapted to each trait. Models are validated on a set of synthetic images and it is shown that using attributes increases significantly the correlation between human and algorithmic evaluations. Then, a new dataset of 140 images is presented and used to demonstrate the relevance of high level attributes for evaluating faces with respect to likability and competence. A model combining both facial keypoints and attributes is finally proposed and applied to picture selection: which picture depicts the most likable face for a given person
How to predict the global instantaneous feeling induced by a facial picture?
International audiencePicture selection is a time-consuming task for humans and a real challenge for machines, which have to retrieve complex and subjective information from image pixels. An automated system that infers human feelings from digital portraits would be of great help for profile picture selection, photo album creation or photo editing. In this work, two models of facial pictures evaluation are defined. The first one predicts the overall aesthetic quality of a facial image, and the second one answers the question " Among a set of facial pictures of a given person, on which picture does the person look like the most friendly? ". Aesthetic quality is evaluated by the computation of 15 features that encode low-level statistics in different image regions (face, eyes, mouth). Relevant features are automatically selected by a feature ranking technique, and the outputs of 4 learning algorithms are fused in order to make a robust and accurate prediction of the image quality. Results are compared with recent works and the proposed algorithm obtains the best performance. The same pipeline is considered to evaluate the likability of a facial picture, with the difference that the estimation is based on high-level attributes such as gender, age, smile. Performance of these attributes is compared with previous techniques that mostly rely on facial keypoints positions, and it is shown that it is possible to obtain likability predictions that are close to human perception. Finally, a combination of both models that selects a likable facial image of good quality for a given person is described
Addressing Neural Network Robustness with Mixup and Targeted Labeling Adversarial Training
Despite their performance, Artificial Neural Networks are not reliable enough
for most of industrial applications. They are sensitive to noises, rotations,
blurs and adversarial examples. There is a need to build defenses that protect
against a wide range of perturbations, covering the most traditional common
corruptions and adversarial examples. We propose a new data augmentation
strategy called M-TLAT and designed to address robustness in a broad sense. Our
approach combines the Mixup augmentation and a new adversarial training
algorithm called Targeted Labeling Adversarial Training (TLAT). The idea of
TLAT is to interpolate the target labels of adversarial examples with the
ground-truth labels. We show that M-TLAT can increase the robustness of image
classifiers towards nineteen common corruptions and five adversarial attacks,
without reducing the accuracy on clean samples
How far generated data can impact Neural Networks performance?
The success of deep learning models depends on the size and quality of the
dataset to solve certain tasks. Here, we explore how far generated data can aid
real data in improving the performance of Neural Networks. In this work, we
consider facial expression recognition since it requires challenging local data
generation at the level of local regions such as mouth, eyebrows, etc, rather
than simple augmentation. Generative Adversarial Networks (GANs) provide an
alternative method for generating such local deformations but they need further
validation. To answer our question, we consider noncomplex Convolutional Neural
Networks (CNNs) based classifiers for recognizing Ekman emotions. For the data
generation process, we consider generating facial expressions (FEs) by relying
on two GANs. The first generates a random identity while the second imposes
facial deformations on top of it. We consider training the CNN classifier using
FEs from: real-faces, GANs-generated, and finally using a combination of real
and GAN-generated faces. We determine an upper bound regarding the data
generation quantity to be mixed with the real one which contributes the most to
enhancing FER accuracy. In our experiments, we find out that 5-times more
synthetic data to the real FEs dataset increases accuracy by 16%.Comment: Conference Publication in Proceedings of the 18th International Joint
Conference on Computer Vision, Imaging and Computer Graphics Theory and
Applications - Volume 5: VISAPP, 10 page
- …