9,993 research outputs found

    Ordinal Distribution Regression for Gait-based Age Estimation

    Full text link
    Computer vision researchers prefer to estimate age from face images because facial features provide useful information. However, estimating age from face images becomes challenging when people are distant from the camera or occluded. A person's gait is a unique biometric feature that can be perceived efficiently even at a distance. Thus, gait can be used to predict age when face images are not available. However, existing gait-based classification or regression methods ignore the ordinal relationship of different ages, which is an important clue for age estimation. This paper proposes an ordinal distribution regression with a global and local convolutional neural network for gait-based age estimation. Specifically, we decompose gait-based age regression into a series of binary classifications to incorporate the ordinal age information. Then, an ordinal distribution loss is proposed to consider the inner relationships among these classifications by penalizing the distribution discrepancy between the estimated value and the ground truth. In addition, our neural network comprises a global and three local sub-networks, and thus, is capable of learning the global structure and local details from the head, body, and feet. Experimental results indicate that the proposed approach outperforms state-of-the-art gait-based age estimation methods on the OULP-Age dataset.Comment: Accepted by the journal of "SCIENCE CHINA Information Sciences

    Rank-consistent Ordinal Regression for Neural Networks

    Full text link
    In many real-world predictions tasks, class labels include information about the relative ordering between labels, which is not captured by commonly-used loss functions such as multi-category cross-entropy. Recently, ordinal regression frameworks have been adopted by the deep learning community to take such ordering information into account. Using a framework that transforms ordinal targets into binary classification subtasks, neural networks were equipped with ordinal regression capabilities. However, this method suffers from inconsistencies among the different binary classifiers. We hypothesize that addressing the inconsistency issue in these binary classification task-based neural networks improves predictive performance. To test this hypothesis, we propose the COnsistent RAnk Logits (CORAL) framework with strong theoretical guarantees for rank-monotonicity and consistent confidence scores. Moreover, the proposed method is architecture-agnostic and can extend arbitrary state-of-the-art deep neural network classifiers for ordinal regression tasks. The empirical evaluation of the proposed rank-consistent method on a range of face-image datasets for age prediction shows a substantial reduction of the prediction error compared to the reference ordinal regression network.Comment: In the previous manuscript version, an issue with the figures caused certain versions of Adobe Acrobat Reader to crash. This version fixes this issu

    2sRanking-CNN: A 2-stage ranking-CNN for diagnosis of glaucoma from fundus images using CAM-extracted ROI as an intermediate input

    Full text link
    Glaucoma is a disease in which the optic nerve is chronically damaged by the elevation of the intra-ocular pressure, resulting in visual field defect. Therefore, it is important to monitor and treat suspected patients before they are confirmed with glaucoma. In this paper, we propose a 2-stage ranking-CNN that classifies fundus images as normal, suspicious, and glaucoma. Furthermore, we propose a method of using the class activation map as a mask filter and combining it with the original fundus image as an intermediate input. Our results have improved the average accuracy by about 10% over the existing 3-class CNN and ranking-CNN, and especially improved the sensitivity of suspicious class by more than 20% over 3-class CNN. In addition, the extracted ROI was also found to overlap with the diagnostic criteria of the physician. The method we propose is expected to be efficiently applied to any medical data where there is a suspicious condition between normal and disease.Comment: Accepted at BMVC 201

    TRk-CNN: Transferable Ranking-CNN for image classification of glaucoma, glaucoma suspect, and normal eyes

    Full text link
    In this paper, we proposed Transferable Ranking Convolutional Neural Network (TRk-CNN) that can be effectively applied when the classes of images to be classified show a high correlation with each other. The multi-class classification method based on the softmax function, which is generally used, is not effective in this case because the inter-class relationship is ignored. Although there is a Ranking-CNN that takes into account the ordinal classes, it cannot reflect the inter-class relationship to the final prediction. TRk-CNN, on the other hand, combines the weights of the primitive classification model to reflect the inter-class information to the final classification phase. We evaluated TRk-CNN in glaucoma image dataset that was labeled into three classes: normal, glaucoma suspect, and glaucoma eyes. Based on the literature we surveyed, this study is the first to classify three status of glaucoma fundus image dataset into three different classes. We compared the evaluation results of TRk-CNN with Ranking-CNN (Rk-CNN) and multi-class CNN (MC-CNN) using the DenseNet as the backbone CNN model. As a result, TRk-CNN achieved an average accuracy of 92.96%, specificity of 93.33%, sensitivity for glaucoma suspect of 95.12% and sensitivity for glaucoma of 93.98%. Based on average accuracy, TRk-CNN is 8.04% and 9.54% higher than Rk-CNN and MC-CNN and surprisingly 26.83% higher for sensitivity for suspicious than multi-class CNN. Our TRk-CNN is expected to be effectively applied to the medical image classification problem where the disease state is continuous and increases in the positive class direction.Comment: 49 pages, 12 figure

    BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation

    Full text link
    Age estimation is an important yet very challenging problem in computer vision. Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process. However, the facial aging process is also a continuous process, and the continuity relationship between different components has not been effectively exploited. In this paper, we propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively. The proposed BridgeNet consists of local regressors and gating networks. Local regressors partition the data space into multiple overlapping subspaces to tackle heterogeneous data and gating networks learn continuity aware weights for the results of local regressors by employing the proposed bridge-tree structure, which introduces bridge connections into tree models to enforce the similarity between neighbor nodes. Moreover, these two components of BridgeNet can be jointly learned in an end-to-end way. We show experimental results on the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet outperforms the state-of-the-art methods.Comment: CVPR 201

    Attended End-to-end Architecture for Age Estimation from Facial Expression Videos

    Full text link
    The main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-to-end architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.Comment: Accepted by Transactions on Image Processing (TIP

    Geometric Image Correspondence Verification by Dense Pixel Matching

    Full text link
    This paper addresses the problem of determining dense pixel correspondences between two images and its application to geometric correspondence verification in image retrieval. The main contribution is a geometric correspondence verification approach for re-ranking a shortlist of retrieved database images based on their dense pair-wise matching with the query image at a pixel level. We determine a set of cyclically consistent dense pixel matches between the pair of images and evaluate local similarity of matched pixels using neural network based image descriptors. Final re-ranking is based on a novel similarity function, which fuses the local similarity metric with a global similarity metric and a geometric consistency measure computed for the matched pixels. For dense matching our approach utilizes a modified version of a recently proposed dense geometric correspondence network (DGC-Net), which we also improve by optimizing the architecture. The proposed model and similarity metric compare favourably to the state-of-the-art image retrieval methods. In addition, we apply our method to the problem of long-term visual localization demonstrating promising results and generalization across datasets.Comment: The appendix has been updated by adding some clarification

    A Coupled Evolutionary Network for Age Estimation

    Full text link
    Age estimation of unknown persons is a challenging pattern analysis task due to the lacking of training data and various aging mechanisms for different people. Label distribution learning-based methods usually make distribution assumptions to simplify age estimation. However, age label distributions are often complex and difficult to be modeled in a parameter way. Inspired by the biological evolutionary mechanism, we propose a Coupled Evolutionary Network (CEN) with two concurrent evolutionary processes: evolutionary label distribution learning and evolutionary slack regression. Evolutionary network learns and refines age label distributions in an iteratively learning way. Evolutionary label distribution learning adaptively learns and constantly refines the age label distributions without making strong assumptions on the distribution patterns. To further utilize the ordered and continuous information of age labels, we accordingly propose an evolutionary slack regression to convert the discrete age label regression into the continuous age interval regression. Experimental results on Morph, ChaLearn15 and MegaAge-Asian datasets show the superiority of our method

    Modeling of Facial Aging and Kinship: A Survey

    Full text link
    Computational facial models that capture properties of facial cues related to aging and kinship increasingly attract the attention of the research community, enabling the development of reliable methods for age progression, age estimation, age-invariant facial characterization, and kinship verification from visual data. In this paper, we review recent advances in modeling of facial aging and kinship. In particular, we provide an up-to date, complete list of available annotated datasets and an in-depth analysis of geometric, hand-crafted, and learned facial representations that are used for facial aging and kinship characterization. Moreover, evaluation protocols and metrics are reviewed and notable experimental results for each surveyed task are analyzed. This survey allows us to identify challenges and discuss future research directions for the development of robust facial models in real-world conditions

    RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation

    Full text link
    We address the challenging problem of RGB image-based head pose estimation. We first reformulate head pose representation learning to constrain it to a bounded space. Head pose represented as vector projection or vector angles shows helpful to improving performance. Further, a ranking loss combined with MSE regression loss is proposed. The ranking loss supervises a neural network with paired samples of the same person and penalises incorrect ordering of pose prediction. Analysis on this new loss function suggests it contributes to a better local feature extractor, where features are generalised to Abstract Landmarks which are pose-related features instead of pose-irrelevant information such as identity, age, and lighting. Extensive experiments show that our method significantly outperforms the current state-of-the-art schemes on public datasets: AFLW2000 and BIWI. Our model achieves significant improvements over previous SOTA MAE on AFLW2000 and BIWI from 4.50 to 3.66 and from 4.0 to 3.71 respectively. Source code will be made available at: https://github.com/seathiefwang/RankHeadPose
    corecore