800 research outputs found

    Asian female facial beauty prediction using deep neural networks via transfer learning and multi-channel feature fusion

    Get PDF
    Facial beauty plays an important role in many fields today, such as digital entertainment, facial beautification surgery and etc. However, the facial beauty prediction task has the challenges of insufficient training datasets, low performance of traditional methods, and rarely takes advantage of the feature learning of Convolutional Neural Networks. In this paper, a transfer learning based CNN method that integrates multiple channel features is utilized for Asian female facial beauty prediction tasks. Firstly, a Large-Scale Asian Female Beauty Dataset (LSAFBD) with a more reasonable distribution has been established. Secondly, in order to improve CNN's self-learning ability of facial beauty prediction task, an effective CNN using a novel Softmax-MSE loss function and a double activation layer has been proposed. Then, a data augmentation method and transfer learning strategy were also utilized to mitigate the impact of insufficient data on proposed CNN performance. Finally, a multi-channel feature fusion method was explored to further optimize the proposed CNN model. Experimental results show that the proposed method is superior to traditional learning method combating the Asian female FBP task. Compared with other state-of-the-art CNN models, the proposed CNN model can improve the rank-1 recognition rate from 60.40% to 64.85%, and the pearson correlation coefficient from 0.8594 to 0.8829 on the LSAFBD and obtained 0.9200 regression prediction results on the SCUT dataset

    Facial Beauty Prediction and Analysis based on Deep Convolutional Neural Network: A Review

    Get PDF
    Abstract: Facial attractiveness or facial beauty prediction (FBP) is a current study that has several potential usages. It is a key difficulty area in the computer vision domain because of the few public databases related to FBP and its experimental trials on the minor-scale database. Moreover, the evaluation of facial beauty is personalized in nature, with people having personalized favor of beauty. Deep learning techniques have displayed a significant ability in terms of analysis and feature representation. The previous studies focussed on scattered portions of facial beauty with fewer comparisons between diverse techniques. Thus, this article reviewed the recent research on computer prediction and analysis of face beauty based on deep convolution neural network DCNN. Furthermore, the provided possible lines of research and challenges in this article can help researchers in advancing the state – of- art in future work

    CNN based facial aesthetics analysis through dynamic robust losses and ensemble regression

    Get PDF
    In recent years, estimating beauty of faces has attracted growing interest in the fields of computer vision and machine learning. This is due to the emergence of face beauty datasets (such as SCUT-FBP, SCUT-FBP5500 and KDEF-PT) and the prevalence of deep learning methods in many tasks. The goal of this work is to leverage the advances in Deep Learning architectures to provide stable and accurate face beauty estimation from static face images. To this end, our proposed approach has three main contributions. To deal with the complicated high-level features associated with the FBP problem by using more than one pre-trained Convolutional Neural Network (CNN) model, we propose an architecture with two backbones (2B-IncRex). In addition to 2B-IncRex, we introduce a parabolic dynamic law to control the behavior of the robust loss parameters during training. These robust losses are ParamSmoothL1, Huber, and Tukey. As a third contribution, we propose an ensemble regression based on five regressors, namely Resnext-50, Inception-v3 and three regressors based on our proposed 2B-IncRex architecture. These models are trained with the following dynamic loss functions: Dynamic ParamSmoothL1, Dynamic Tukey, Dynamic ParamSmoothL1, Dynamic Huber, and Dynamic Tukey, respectively. To evaluate the performance of our approach, we used two datasets: SCUT-FBP5500 and KDEF-PT. The dataset SCUT-FBP5500 contains two evaluation scenarios provided by the database developers: 60-40% split and five- fold cross-validation. Our approach outperforms state-of-the-art methods on several metrics in both evaluation scenarios of SCUT-FBP5500. Moreover, experiments on the KDEF-PT dataset demonstrate the efficiency of our approach for estimating facial beauty using transfer learning, despite the presence of facial expressions and limited data. These comparisons highlight the effectiveness of the proposed solutions for FBP. They also show that the proposed Dynamic robust losses lead to more flexible and accurate estimators.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature

    表情における複雑と連続な感情表現の学習に関する研究

    Get PDF
    博士(工学)神戸大

    Highly Accurate, But Still Discriminatory

    Get PDF
    The study aims to identify whether algorithmic decision making leads to unfair (i.e., unequal) treatment of certain protected groups in the recruitment context. Firms increasingly implement algorithmic decision making to save costs and increase efficiency. Moreover, algorithmic decision making is considered to be fairer than human decisions due to social prejudices. Recent publications, however, imply that the fairness of algorithmic decision making is not necessarily given. Therefore, to investigate this further, highly accurate algorithms were used to analyze a pre-existing data set of 10,000 video clips of individuals in self-presentation settings. The analysis shows that the under-representation concerning gender and ethnicity in the training data set leads to an unpredictable overestimation and/or underestimation of the likelihood of inviting representatives of these groups to a job interview. Furthermore, algorithms replicate the existing inequalities in the data set. Firms have to be careful when implementing algorithmic video analysis during recruitment as biases occur if the underlying training data set is unbalanced

    Modeling Visual Rhetoric and Semantics in Multimedia

    Get PDF
    Recent advances in machine learning have enabled computer vision algorithms to model complicated visual phenomena with accuracies unthinkable a mere decade ago. Their high-performance on a plethora of vision-related tasks has enabled computer vision researchers to begin to move beyond traditional visual recognition problems to tasks requiring higher-level image understanding. However, most computer vision research still focuses on describing what images, text, or other media literally portrays. In contrast, in this dissertation we focus on learning how and why such content is portrayed. Rather than viewing media for its content, we recast the problem as understanding visual communication and visual rhetoric. For example, the same content may be portrayed in different ways in order to present the story the author wishes to convey. We thus seek to model not only the content of the media, but its authorial intent and latent messaging. Understanding how and why visual content is portrayed a certain way requires understanding higher level abstract semantic concepts which are themselves latent within visual media. By latent, we mean the concept is not readily visually accessible within a single image (e.g. right vs left political bias), in contrast to explicit visual semantic concepts such as objects. Specifically, we study the problems of modeling photographic style (how professional photographers portray their subjects), understanding visual persuasion in image advertisements, modeling political bias in multimedia (image and text) news articles, and learning cross-modal semantic representations. While most past research in vision and natural language processing studies the case where visual content and paired text are highly aligned (as in the case of image captions), we target the case where each modality conveys complementary information to tell a larger story. We particularly focus on the problem of learning cross-modal representations from multimedia exhibiting weak alignment between the image and text modalities. A variety of techniques are presented which improve modeling of multimedia rhetoric in real-world data and enable more robust artificially intelligent systems

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
    corecore