187 research outputs found
Facial Beauty Prediction and Analysis based on Deep Convolutional Neural Network: A Review
Abstract: Facial attractiveness or facial beauty prediction (FBP) is a current study that has several potential usages. It is a key difficulty area in the computer vision domain because of the few public databases related to FBP and its experimental trials on the minor-scale database. Moreover, the evaluation of facial beauty is personalized in nature, with people having personalized favor of beauty. Deep learning techniques have displayed a significant ability in terms of analysis and feature representation. The previous studies focussed on scattered portions of facial beauty with fewer comparisons between diverse techniques. Thus, this article reviewed the recent research on computer prediction and analysis of face beauty based on deep convolution neural network DCNN. Furthermore, the provided possible lines of research and challenges in this article can help researchers in advancing the state – of- art in future work
Modeling and Mapping Location-Dependent Human Appearance
Human appearance is highly variable and depends on individual preferences, such as fashion, facial expression, and makeup. These preferences depend on many factors including a person\u27s sense of style, what they are doing, and the weather. These factors, in turn, are dependent upon geographic location and time. In our work, we build computational models to learn the relationship between human appearance, geographic location, and time. The primary contributions are a framework for collecting and processing geotagged imagery of people, a large dataset collected by our framework, and several generative and discriminative models that use our dataset to learn the relationship between human appearance, location, and time. Additionally, we build interactive maps that allow for inspection and demonstration of what our models have learned
Semi-supervised auto-encoder for facial attributes recognition
The particularity of our faces encourages many researchers to exploit their features in different domains such as user identification, behaviour analysis, computer technology, security, and psychology. In this paper, we present a method for facial attributes analysis. The work addressed to analyse facial images and extract features in the purpose to recognize demographic attributes: age, gender, and ethnicity (AGE). In this work, we exploited the robustness of deep learning (DL) using an updating version of autoencoders called the deep sparse autoencoder (DSAE). In this work we used a new architecture of DSAE by adding the supervision to the classic model and we control the overfitting problem by regularizing the model. The pass from DSAE to the semi-supervised autoencoder (DSSAE) facilitates the supervision process and achieves an excellent performance to extract features. In this work we focused to estimate AGE jointly. The experiment results show that DSSAE is created to recognize facial features with high precision. The whole system achieves good performance and important rates in AGE using the MORPH II databas
Human Age and Gender Classification using Convolutional Neural Networks
In a world relying ever more on human classification, this papers aims to improve on age and gender image classification through the use of Convolutional Neural Networks (CNN). Age and gender classification has become a popular area of study in the past number of years however there are still improvements to be made, particularly in the area of age classification. This research paper aims to test the currently accepted fact that CNN models are the superior model type for image classification by comparing CNN performance against Support Vector Machine performance on the same dataset. Using the Adience image classification dataset, this research also focuses on the implementation of data augmentation techniques, some more novel than others, as a means of improving CNN performance. In terms of standard popular methods of augmentation, image mirroring and image rotation were applied. As well as these, a more novel approach to augmentation was applied to the area of age classification. This technique was completed using Faceapp, an AI image editor in the form of a mobile application. This application allows for the placement of ”filters” on images of human beings in order to alter their appearance. The results of the data augmented models were superior to that of the standard CNN models with gender classification improving by 2.6% while age classification improved by 7.1%. The results of this research establish the potential for further improvements through the inclusion of more augmentation techniques or through the use of more filter types provided in the Faceapp application
Representations of racial minorities in popular movies: A content-analytic synergy of computer vision and network science
In the Hollywood film industry, racial minorities remain underrepresented. Characters from racially underrepresented groups receive less screen time, fewer central story positions, and frequently inherit plotlines, motivations, and actions that are primarily driven by White characters. Currently, there are no clearly defined, standardized, and scalable metrics for taking stock of racial minorities’ cinematographic representation. In this paper, we combine methodological tools from computer vision and network science to develop a content analytic framework for identifying visual and structural racial biases in film productions. We apply our approach on a set of 89 popular, full-length movies, demonstrating that this method provides a scalable examination of racial inclusion in film production and predicts movie performance. We integrate our method into larger theoretical discussions on audiences’ perception of racial minorities and illuminate future research trajectories towards the computational assessment of racial biases in audiovisual narratives
Multimodal Adversarial Learning
Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art
Towards End-to-end Video-based Eye-Tracking
Estimating eye-gaze from images alone is a challenging task, in large parts
due to un-observable person-specific factors. Achieving high accuracy typically
requires labeled data from test users which may not be attainable in real
applications. We observe that there exists a strong relationship between what
users are looking at and the appearance of the user's eyes. In response to this
understanding, we propose a novel dataset and accompanying method which aims to
explicitly learn these semantic and temporal relationships. Our video dataset
consists of time-synchronized screen recordings, user-facing camera views, and
eye gaze data, which allows for new benchmarks in temporal gaze tracking as
well as label-free refinement of gaze. Importantly, we demonstrate that the
fusion of information from visual stimuli as well as eye images can lead
towards achieving performance similar to literature-reported figures acquired
through supervised personalization. Our final method yields significant
performance improvements on our proposed EVE dataset, with up to a 28 percent
improvement in Point-of-Gaze estimates (resulting in 2.49 degrees in angular
error), paving the path towards high-accuracy screen-based eye tracking purely
from webcam sensors. The dataset and reference source code are available at
https://ait.ethz.ch/projects/2020/EVEComment: Accepted at ECCV 202
Recommended from our members
An Investigation into the Performance of Ethnicity Verification Between Humans and Machine Learning Algorithms
There has been a significant increase in the interest for the task of classifying
demographic profiles i.e. race and ethnicity. Ethnicity is a significant human
characteristic and applying facial image data for the discrimination of ethnicity is
integral to face-related biometric systems. Given the diversity in the application
of ethnicity-specific information such as face recognition and iris recognition, and
the availability of image datasets for more commonly available human
populations, i.e. Caucasian, African-American, Asians, and South-Asian Indians.
A gap has been identified for the development of a system which analyses the
full-face and its individual feature-components (eyes, nose and mouth), for the
Pakistani ethnic group. An efficient system is proposed for the verification of the
Pakistani ethnicity, which incorporates a two-tier (computer vs human) approach.
Firstly, hand-crafted features were used to ascertain the descriptive nature of a
frontal-image and facial profile, for the Pakistani ethnicity. A total of 26 facial
landmarks were selected (16 frontal and 10 for the profile) and by incorporating
2 models for redundant information removal, and a linear classifier for the binary
task. The experimental results concluded that the facial profile image of a
Pakistani face is distinct amongst other ethnicities. However, the methodology
consisted of limitations for example, low performance accuracy, the laborious
nature of manual data i.e. facial landmark, annotation, and the small facial image
dataset. To make the system more accurate and robust, Deep Learning models
are employed for ethnicity classification. Various state-of-the-art Deep models
are trained on a range of facial image conditions, i.e. full face and partial-face
images, plus standalone feature components such as the nose and mouth. Since
ethnicity is pertinent to the research, a novel facial image database entitled
Pakistani Face Database (PFDB), was created using a criterion-specific selection
process, to ensure assurance in each of the assigned class-memberships, i.e.
Pakistani and Non-Pakistani. Comparative analysis between 6 Deep Learning
models was carried out on augmented image datasets, and the analysis
demonstrates that Deep Learning yields better performance accuracy compared
to low-level features. The human phase of the ethnicity classification framework
tested the discrimination ability of novice Pakistani and Non-Pakistani
participants, using a computerised ethnicity task. The results suggest that
humans are better at discriminating between Pakistani and Non-Pakistani full
face images, relative to individual face-feature components (eyes, nose, mouth),
struggling the most with the nose, when making judgements of ethnicity. To
understand the effects of display conditions on ethnicity discrimination accuracy, two conditions were tested; (i) Two-Alternative Forced Choice (2-AFC) and (ii)
Single image procedure. The results concluded that participants perform
significantly better in trials where the target (Pakistani) image is shown alongside
a distractor (Non-Pakistani) image. To conclude the proposed framework,
directions for future study are suggested to advance the current understanding of
image based ethnicity verification.Acumé Forensi
Survey of Social Bias in Vision-Language Models
In recent years, the rapid advancement of machine learning (ML) models,
particularly transformer-based pre-trained models, has revolutionized Natural
Language Processing (NLP) and Computer Vision (CV) fields. However, researchers
have discovered that these models can inadvertently capture and reinforce
social biases present in their training datasets, leading to potential social
harms, such as uneven resource allocation and unfair representation of specific
social groups. Addressing these biases and ensuring fairness in artificial
intelligence (AI) systems has become a critical concern in the ML community.
The recent introduction of pre-trained vision-and-language (VL) models in the
emerging multimodal field demands attention to the potential social biases
present in these models as well. Although VL models are susceptible to social
bias, there is a limited understanding compared to the extensive discussions on
bias in NLP and CV. This survey aims to provide researchers with a high-level
insight into the similarities and differences of social bias studies in
pre-trained models across NLP, CV, and VL. By examining these perspectives, the
survey aims to offer valuable guidelines on how to approach and mitigate social
bias in both unimodal and multimodal settings. The findings and recommendations
presented here can benefit the ML community, fostering the development of
fairer and non-biased AI models in various applications and research endeavors
- …