14,302 research outputs found

    Modeling of Facial Aging and Kinship: A Survey

    Full text link
    Computational facial models that capture properties of facial cues related to aging and kinship increasingly attract the attention of the research community, enabling the development of reliable methods for age progression, age estimation, age-invariant facial characterization, and kinship verification from visual data. In this paper, we review recent advances in modeling of facial aging and kinship. In particular, we provide an up-to date, complete list of available annotated datasets and an in-depth analysis of geometric, hand-crafted, and learned facial representations that are used for facial aging and kinship characterization. Moreover, evaluation protocols and metrics are reviewed and notable experimental results for each surveyed task are analyzed. This survey allows us to identify challenges and discuss future research directions for the development of robust facial models in real-world conditions

    Attended End-to-end Architecture for Age Estimation from Facial Expression Videos

    Full text link
    The main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-to-end architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.Comment: Accepted by Transactions on Image Processing (TIP

    Inferring Dynamic Representations of Facial Actions from a Still Image

    Full text link
    Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel approach to capture multiple scales of such temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular, 1) we propose a framework that infers a dynamic representation (DR) from a still image, which captures the bi-directional flow of time within a short time-window centered at the input image; 2) we show that we can train our method without the need of explicitly generating target representations, allowing the network to represent dynamics more broadly; and 3) we propose to apply a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time.Comment: 10 pages, 5 figure

    Ordinal Distribution Regression for Gait-based Age Estimation

    Full text link
    Computer vision researchers prefer to estimate age from face images because facial features provide useful information. However, estimating age from face images becomes challenging when people are distant from the camera or occluded. A person's gait is a unique biometric feature that can be perceived efficiently even at a distance. Thus, gait can be used to predict age when face images are not available. However, existing gait-based classification or regression methods ignore the ordinal relationship of different ages, which is an important clue for age estimation. This paper proposes an ordinal distribution regression with a global and local convolutional neural network for gait-based age estimation. Specifically, we decompose gait-based age regression into a series of binary classifications to incorporate the ordinal age information. Then, an ordinal distribution loss is proposed to consider the inner relationships among these classifications by penalizing the distribution discrepancy between the estimated value and the ground truth. In addition, our neural network comprises a global and three local sub-networks, and thus, is capable of learning the global structure and local details from the head, body, and feet. Experimental results indicate that the proposed approach outperforms state-of-the-art gait-based age estimation methods on the OULP-Age dataset.Comment: Accepted by the journal of "SCIENCE CHINA Information Sciences

    BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation

    Full text link
    Age estimation is an important yet very challenging problem in computer vision. Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process. However, the facial aging process is also a continuous process, and the continuity relationship between different components has not been effectively exploited. In this paper, we propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively. The proposed BridgeNet consists of local regressors and gating networks. Local regressors partition the data space into multiple overlapping subspaces to tackle heterogeneous data and gating networks learn continuity aware weights for the results of local regressors by employing the proposed bridge-tree structure, which introduces bridge connections into tree models to enforce the similarity between neighbor nodes. Moreover, these two components of BridgeNet can be jointly learned in an end-to-end way. We show experimental results on the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet outperforms the state-of-the-art methods.Comment: CVPR 201

    RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation

    Full text link
    We address the challenging problem of RGB image-based head pose estimation. We first reformulate head pose representation learning to constrain it to a bounded space. Head pose represented as vector projection or vector angles shows helpful to improving performance. Further, a ranking loss combined with MSE regression loss is proposed. The ranking loss supervises a neural network with paired samples of the same person and penalises incorrect ordering of pose prediction. Analysis on this new loss function suggests it contributes to a better local feature extractor, where features are generalised to Abstract Landmarks which are pose-related features instead of pose-irrelevant information such as identity, age, and lighting. Extensive experiments show that our method significantly outperforms the current state-of-the-art schemes on public datasets: AFLW2000 and BIWI. Our model achieves significant improvements over previous SOTA MAE on AFLW2000 and BIWI from 4.50 to 3.66 and from 4.0 to 3.71 respectively. Source code will be made available at: https://github.com/seathiefwang/RankHeadPose

    C3AE: Exploring the Limits of Compact Model for Age Estimation

    Full text link
    Age estimation is a classic learning problem in computer vision. Many larger and deeper CNNs have been proposed with promising performance, such as AlexNet, VggNet, GoogLeNet and ResNet. However, these models are not practical for the embedded/mobile devices. Recently, MobileNets and ShuffleNets have been proposed to reduce the number of parameters, yielding lightweight models. However, their representation has been weakened because of the adoption of depth-wise separable convolution. In this work, we investigate the limits of compact model for small-scale image and propose an extremely Compact yet efficient Cascade Context-based Age Estimation model(C3AE). This model possesses only 1/9 and 1/2000 parameters compared with MobileNets/ShuffleNets and VggNet, while achieves competitive performance. In particular, we re-define age estimation problem by two-points representation, which is implemented by a cascade model. Moreover, to fully utilize the facial context information, multi-branch CNN network is proposed to aggregate multi-scale context. Experiments are carried out on three age estimation datasets. The state-of-the-art performance on compact model has been achieved with a relatively large margin.Comment: accepted by cvpr201

    A Coupled Evolutionary Network for Age Estimation

    Full text link
    Age estimation of unknown persons is a challenging pattern analysis task due to the lacking of training data and various aging mechanisms for different people. Label distribution learning-based methods usually make distribution assumptions to simplify age estimation. However, age label distributions are often complex and difficult to be modeled in a parameter way. Inspired by the biological evolutionary mechanism, we propose a Coupled Evolutionary Network (CEN) with two concurrent evolutionary processes: evolutionary label distribution learning and evolutionary slack regression. Evolutionary network learns and refines age label distributions in an iteratively learning way. Evolutionary label distribution learning adaptively learns and constantly refines the age label distributions without making strong assumptions on the distribution patterns. To further utilize the ordered and continuous information of age labels, we accordingly propose an evolutionary slack regression to convert the discrete age label regression into the continuous age interval regression. Experimental results on Morph, ChaLearn15 and MegaAge-Asian datasets show the superiority of our method

    TRk-CNN: Transferable Ranking-CNN for image classification of glaucoma, glaucoma suspect, and normal eyes

    Full text link
    In this paper, we proposed Transferable Ranking Convolutional Neural Network (TRk-CNN) that can be effectively applied when the classes of images to be classified show a high correlation with each other. The multi-class classification method based on the softmax function, which is generally used, is not effective in this case because the inter-class relationship is ignored. Although there is a Ranking-CNN that takes into account the ordinal classes, it cannot reflect the inter-class relationship to the final prediction. TRk-CNN, on the other hand, combines the weights of the primitive classification model to reflect the inter-class information to the final classification phase. We evaluated TRk-CNN in glaucoma image dataset that was labeled into three classes: normal, glaucoma suspect, and glaucoma eyes. Based on the literature we surveyed, this study is the first to classify three status of glaucoma fundus image dataset into three different classes. We compared the evaluation results of TRk-CNN with Ranking-CNN (Rk-CNN) and multi-class CNN (MC-CNN) using the DenseNet as the backbone CNN model. As a result, TRk-CNN achieved an average accuracy of 92.96%, specificity of 93.33%, sensitivity for glaucoma suspect of 95.12% and sensitivity for glaucoma of 93.98%. Based on average accuracy, TRk-CNN is 8.04% and 9.54% higher than Rk-CNN and MC-CNN and surprisingly 26.83% higher for sensitivity for suspicious than multi-class CNN. Our TRk-CNN is expected to be effectively applied to the medical image classification problem where the disease state is continuous and increases in the positive class direction.Comment: 49 pages, 12 figure

    UVA: A Universal Variational Framework for Continuous Age Analysis

    Full text link
    Conventional methods for facial age analysis tend to utilize accurate age labels in a supervised way. However, existing age datasets lies in a limited range of ages, leading to a long-tailed distribution. To alleviate the problem, this paper proposes a Universal Variational Aging (UVA) framework to formulate facial age priors in a disentangling manner. Benefiting from the variational evidence lower bound, the facial images are encoded and disentangled into an age-irrelevant distribution and an age-related distribution in the latent space. A conditional introspective adversarial learning mechanism is introduced to boost the image quality. In this way, when manipulating the age-related distribution, UVA can achieve age translation with arbitrary ages. Further, by sampling noise from the age-irrelevant distribution, we can generate photorealistic facial images with a specific age. Moreover, given an input face image, the mean value of age-related distribution can be treated as an age estimator. These indicate that UVA can efficiently and accurately estimate the age-related distribution by a disentangling manner, even if the training dataset performs a long-tailed age distribution. UVA is the first attempt to achieve facial age analysis tasks, including age translation, age generation and age estimation, in a universal framework. The qualitative and quantitative experiments demonstrate the superiority of UVA on five popular datasets, including CACD2000, Morph, UTKFace, MegaAge-Asian and FG-NET
    corecore