325 research outputs found

    Face Captioning Using Prominent Feature Recognition

    Get PDF
    Humans rely on prominent feature recognition to correctly identify and describe previously seen faces. Despite this fact, there is little existing work investigating how prominent facial features can be automatically recognized and used to create natural language face descriptions. Facial attribute prediction, a more commonly studied problem in computer vision, has previously been used for this task. However, the evaluation metrics and baseline models currently used to compare different attribute prediction methods are insufficient for determining which approaches are best at classifying highly imbalanced attributes. We also show that CelebA, the largest and most widely used facial attribute dataset, is too poorly labeled to be suitable for prominent feature recognition. To deal with these issues, we propose a method for generating weak prominent feature labels using semantic segmentation and show that we can use these labels to improve attribute-based face description

    Learning Explainable Facial Features from Noisy Unconstrained Visual Data

    Get PDF
    Attributes are semantic features of objects, people, and activities. They allow computers to describe people and things in the way humans would, which makes them very useful for recognition. Facial attributes - gender, hair color, makeup, eye color, etc. - are useful for a variety of different tasks, including face verification and recognition, user interface applications, and surveillance, to name a few. The problem of predicting facial attributes is still relatively new in computer vision. Because facial attribute recognition is not a long-studied problem, a lack of publicly available data is a major challenge. As with many problems in computer vision, a large portion of facial attribute research is dedicated to improving performance on benchmark datasets. However, it has been shown that research progress on a benchmark dataset does not necessarily translate to a genuine solution for the problem. This dissertation focuses on learning models for facial attributes that are robust to changes in data, i.e. the models perform well on unseen data. We do this by taking cues from human recognition, and translating these ideas into deep learning techniques for robust facial attribute recognition. Towards this goal, we introduce several techniques for learning from noisy unconstrained visual data: utilizing relationships among attributes, a selective learning approach for multi-label balancing, a temporal coherence constraint and a motion-attention mechanism for recognizing attributes in video, and parsing faces according to attributes for improved localization. We know that facial attributes are related, e.g. heavy makeup and wearing lipstick or male and goatee. Humans are capable of recognizing and taking advantage of these relationships. For example, if a face of a subject is occluded, and facial hair can be seen, then the likelihood that the subject being male should increase. We introduce several methods for implicitly and explicitly utilizing attribute relationships for improved prediction. Some attributes are more common than others in the real world, e.g. male v. bald. These disparities are even more pronounced in datasets consisting of posed celebrities on the red carpet (i.e. there are very few celebrities not wearing makeup). These imbalances can cause a facial attribute model to learn the bias in the dataset, rather than a true representation for the attribute. To alleviate this problem, we introduce selective learning, a method of balancing each batch in a deep learning algorithm according to each attribute given a target distribution. Selective learning allows a deep learning algorithm to learn from a balanced set of data at each iteration during training, removing the bias from the label imbalance. Learning a facial attribute model from image data, and testing on video data gives unexpected results (e.g. gender changing between frames). When working with video, it is important to account for the temporal and motion aspects of the data. In order to stabilize attribute predictions in video, we utilized weakly-labeled data and introduced time and motion constraints in the model learning process. Introducing temporal coherence and motion-attention constraints during learning of an attribute model allows the use of weakly-labeled data, which is essential when working with video. Framing the problem of facial attribute recognition as one of semantic segmentation, where the goal is to predict attributes at each pixel, we are able to reduce the effect of unwanted relationships between attributes (e.g. high cheekbones and smiling ). Robust facial attribute recognition algorithms are necessary for improving the applications which use these attributes. Given limited data for training, we develop several methods for learning explainable facial features from noisy unconstrained visual data, introducing several new datasets labeled with facial attributes and improving over the state-of-the-art

    A multi-agent system for the classification of gender and age from images

    Get PDF
    [EN] The automatic classification of human images on the basis of age range and gender can be used in audiovisual content adaptation for Smart TVs or marquee advertising. Knowledge about users is used by publishing agencies and departments regulating TV content; on the basis of this information (age, gender) they are able to provide content that suits the interests of users. To this end, the creation of a highly precise image pattern recognition system is necessary, this may be one of the greatest challenges faced by computer technology in the last decades. These recognition systems must apply different pattern recognition techniques, in order to distinct gender and age in the images. In this work, we propose a multi-agent system that integrates different techniques for the acquisition, preprocessing and processing of images for the classification of age and gender. The system has been tested in an office building. Thanks to the use of a multi-agent system which allows to apply different workflows simultaneously, the performance of different methods could be compared (each flow with a different configuration). Experimental results have confirmed that a good preprocessing stage is necessary if we want the classification methods to perform well (Fisherfaces, Eigenfaces, Local Binary Patterns, Multilayer perceptron). The Fisherfaces method has proved to be more effective than MLP and the training time was shorter. In terms of the classification of age, Fisherfaces offers the best results in comparison to the rest of the system’s classifiers. The use of filters has allowed to reduce dimensionality, as a result the workload was reduced, a great advantage in a system that performs classification in real time

    CATFace: Cross-Attribute-Guided Transformer with Self-Attention Distillation for Low-Quality Face Recognition

    Full text link
    Although face recognition (FR) has achieved great success in recent years, it is still challenging to accurately recognize faces in low-quality images due to the obscured facial details. Nevertheless, it is often feasible to make predictions about specific soft biometric (SB) attributes, such as gender, and baldness even in dealing with low-quality images. In this paper, we propose a novel multi-branch neural network that leverages SB attribute information to boost the performance of FR. To this end, we propose a cross-attribute-guided transformer fusion (CATF) module that effectively captures the long-range dependencies and relationships between FR and SB feature representations. The synergy created by the reciprocal flow of information in the dual cross-attention operations of the proposed CATF module enhances the performance of FR. Furthermore, we introduce a novel self-attention distillation framework that effectively highlights crucial facial regions, such as landmarks by aligning low-quality images with those of their high-quality counterparts in the feature space. The proposed self-attention distillation regularizes our network to learn a unified quality-invariant feature representation in unconstrained environments. We conduct extensive experiments on various FR benchmarks varying in quality. Experimental results demonstrate the superiority of our FR method compared to state-of-the-art FR studies.Comment: Accepted in IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM), 202
    • …
    corecore