187 research outputs found

    Facial Beauty Prediction and Analysis based on Deep Convolutional Neural Network: A Review

    Get PDF
    Abstract: Facial attractiveness or facial beauty prediction (FBP) is a current study that has several potential usages. It is a key difficulty area in the computer vision domain because of the few public databases related to FBP and its experimental trials on the minor-scale database. Moreover, the evaluation of facial beauty is personalized in nature, with people having personalized favor of beauty. Deep learning techniques have displayed a significant ability in terms of analysis and feature representation. The previous studies focussed on scattered portions of facial beauty with fewer comparisons between diverse techniques. Thus, this article reviewed the recent research on computer prediction and analysis of face beauty based on deep convolution neural network DCNN. Furthermore, the provided possible lines of research and challenges in this article can help researchers in advancing the state – of- art in future work

    Modeling and Mapping Location-Dependent Human Appearance

    Get PDF
    Human appearance is highly variable and depends on individual preferences, such as fashion, facial expression, and makeup. These preferences depend on many factors including a person\u27s sense of style, what they are doing, and the weather. These factors, in turn, are dependent upon geographic location and time. In our work, we build computational models to learn the relationship between human appearance, geographic location, and time. The primary contributions are a framework for collecting and processing geotagged imagery of people, a large dataset collected by our framework, and several generative and discriminative models that use our dataset to learn the relationship between human appearance, location, and time. Additionally, we build interactive maps that allow for inspection and demonstration of what our models have learned

    Semi-supervised auto-encoder for facial attributes recognition

    Get PDF
    The particularity of our faces encourages many researchers to exploit their features in different domains such as user identification, behaviour analysis, computer technology, security, and psychology. In this paper, we present a method for facial attributes analysis. The work addressed to analyse facial images and extract features in the purpose to recognize demographic attributes: age, gender, and ethnicity (AGE). In this work, we exploited the robustness of deep learning (DL) using an updating version of autoencoders called the deep sparse autoencoder (DSAE). In this work we used a new architecture of DSAE by adding the supervision to the classic model and we control the overfitting problem by regularizing the model. The pass from DSAE to the semi-supervised autoencoder (DSSAE) facilitates the supervision process and achieves an excellent performance to extract features. In this work we focused to estimate AGE jointly. The experiment results show that DSSAE is created to recognize facial features with high precision. The whole system achieves good performance and important rates in AGE using the MORPH II databas

    Human Age and Gender Classification using Convolutional Neural Networks

    Get PDF
    In a world relying ever more on human classification, this papers aims to improve on age and gender image classification through the use of Convolutional Neural Networks (CNN). Age and gender classification has become a popular area of study in the past number of years however there are still improvements to be made, particularly in the area of age classification. This research paper aims to test the currently accepted fact that CNN models are the superior model type for image classification by comparing CNN performance against Support Vector Machine performance on the same dataset. Using the Adience image classification dataset, this research also focuses on the implementation of data augmentation techniques, some more novel than others, as a means of improving CNN performance. In terms of standard popular methods of augmentation, image mirroring and image rotation were applied. As well as these, a more novel approach to augmentation was applied to the area of age classification. This technique was completed using Faceapp, an AI image editor in the form of a mobile application. This application allows for the placement of ”filters” on images of human beings in order to alter their appearance. The results of the data augmented models were superior to that of the standard CNN models with gender classification improving by 2.6% while age classification improved by 7.1%. The results of this research establish the potential for further improvements through the inclusion of more augmentation techniques or through the use of more filter types provided in the Faceapp application

    Representations of racial minorities in popular movies: A content-analytic synergy of computer vision and network science

    Get PDF
    In the Hollywood film industry, racial minorities remain underrepresented. Characters from racially underrepresented groups receive less screen time, fewer central story positions, and frequently inherit plotlines, motivations, and actions that are primarily driven by White characters. Currently, there are no clearly defined, standardized, and scalable metrics for taking stock of racial minorities’ cinematographic representation. In this paper, we combine methodological tools from computer vision and network science to develop a content analytic framework for identifying visual and structural racial biases in film productions. We apply our approach on a set of 89 popular, full-length movies, demonstrating that this method provides a scalable examination of racial inclusion in film production and predicts movie performance. We integrate our method into larger theoretical discussions on audiences’ perception of racial minorities and illuminate future research trajectories towards the computational assessment of racial biases in audiovisual narratives

    Multimodal Adversarial Learning

    Get PDF
    Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art

    Towards End-to-end Video-based Eye-Tracking

    Full text link
    Estimating eye-gaze from images alone is a challenging task, in large parts due to un-observable person-specific factors. Achieving high accuracy typically requires labeled data from test users which may not be attainable in real applications. We observe that there exists a strong relationship between what users are looking at and the appearance of the user's eyes. In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. Our video dataset consists of time-synchronized screen recordings, user-facing camera views, and eye gaze data, which allows for new benchmarks in temporal gaze tracking as well as label-free refinement of gaze. Importantly, we demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures acquired through supervised personalization. Our final method yields significant performance improvements on our proposed EVE dataset, with up to a 28 percent improvement in Point-of-Gaze estimates (resulting in 2.49 degrees in angular error), paving the path towards high-accuracy screen-based eye tracking purely from webcam sensors. The dataset and reference source code are available at https://ait.ethz.ch/projects/2020/EVEComment: Accepted at ECCV 202

    Survey of Social Bias in Vision-Language Models

    Full text link
    In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as uneven resource allocation and unfair representation of specific social groups. Addressing these biases and ensuring fairness in artificial intelligence (AI) systems has become a critical concern in the ML community. The recent introduction of pre-trained vision-and-language (VL) models in the emerging multimodal field demands attention to the potential social biases present in these models as well. Although VL models are susceptible to social bias, there is a limited understanding compared to the extensive discussions on bias in NLP and CV. This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL. By examining these perspectives, the survey aims to offer valuable guidelines on how to approach and mitigate social bias in both unimodal and multimodal settings. The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models in various applications and research endeavors
    • …
    corecore