9,993 research outputs found
Ordinal Distribution Regression for Gait-based Age Estimation
Computer vision researchers prefer to estimate age from face images because
facial features provide useful information. However, estimating age from face
images becomes challenging when people are distant from the camera or occluded.
A person's gait is a unique biometric feature that can be perceived efficiently
even at a distance. Thus, gait can be used to predict age when face images are
not available. However, existing gait-based classification or regression
methods ignore the ordinal relationship of different ages, which is an
important clue for age estimation. This paper proposes an ordinal distribution
regression with a global and local convolutional neural network for gait-based
age estimation. Specifically, we decompose gait-based age regression into a
series of binary classifications to incorporate the ordinal age information.
Then, an ordinal distribution loss is proposed to consider the inner
relationships among these classifications by penalizing the distribution
discrepancy between the estimated value and the ground truth. In addition, our
neural network comprises a global and three local sub-networks, and thus, is
capable of learning the global structure and local details from the head, body,
and feet. Experimental results indicate that the proposed approach outperforms
state-of-the-art gait-based age estimation methods on the OULP-Age dataset.Comment: Accepted by the journal of "SCIENCE CHINA Information Sciences
Rank-consistent Ordinal Regression for Neural Networks
In many real-world predictions tasks, class labels include information about
the relative ordering between labels, which is not captured by commonly-used
loss functions such as multi-category cross-entropy. Recently, ordinal
regression frameworks have been adopted by the deep learning community to take
such ordering information into account. Using a framework that transforms
ordinal targets into binary classification subtasks, neural networks were
equipped with ordinal regression capabilities. However, this method suffers
from inconsistencies among the different binary classifiers. We hypothesize
that addressing the inconsistency issue in these binary classification
task-based neural networks improves predictive performance. To test this
hypothesis, we propose the COnsistent RAnk Logits (CORAL) framework with strong
theoretical guarantees for rank-monotonicity and consistent confidence scores.
Moreover, the proposed method is architecture-agnostic and can extend arbitrary
state-of-the-art deep neural network classifiers for ordinal regression tasks.
The empirical evaluation of the proposed rank-consistent method on a range of
face-image datasets for age prediction shows a substantial reduction of the
prediction error compared to the reference ordinal regression network.Comment: In the previous manuscript version, an issue with the figures caused
certain versions of Adobe Acrobat Reader to crash. This version fixes this
issu
2sRanking-CNN: A 2-stage ranking-CNN for diagnosis of glaucoma from fundus images using CAM-extracted ROI as an intermediate input
Glaucoma is a disease in which the optic nerve is chronically damaged by the
elevation of the intra-ocular pressure, resulting in visual field defect.
Therefore, it is important to monitor and treat suspected patients before they
are confirmed with glaucoma. In this paper, we propose a 2-stage ranking-CNN
that classifies fundus images as normal, suspicious, and glaucoma. Furthermore,
we propose a method of using the class activation map as a mask filter and
combining it with the original fundus image as an intermediate input. Our
results have improved the average accuracy by about 10% over the existing
3-class CNN and ranking-CNN, and especially improved the sensitivity of
suspicious class by more than 20% over 3-class CNN. In addition, the extracted
ROI was also found to overlap with the diagnostic criteria of the physician.
The method we propose is expected to be efficiently applied to any medical data
where there is a suspicious condition between normal and disease.Comment: Accepted at BMVC 201
TRk-CNN: Transferable Ranking-CNN for image classification of glaucoma, glaucoma suspect, and normal eyes
In this paper, we proposed Transferable Ranking Convolutional Neural Network
(TRk-CNN) that can be effectively applied when the classes of images to be
classified show a high correlation with each other. The multi-class
classification method based on the softmax function, which is generally used,
is not effective in this case because the inter-class relationship is ignored.
Although there is a Ranking-CNN that takes into account the ordinal classes, it
cannot reflect the inter-class relationship to the final prediction. TRk-CNN,
on the other hand, combines the weights of the primitive classification model
to reflect the inter-class information to the final classification phase. We
evaluated TRk-CNN in glaucoma image dataset that was labeled into three
classes: normal, glaucoma suspect, and glaucoma eyes. Based on the literature
we surveyed, this study is the first to classify three status of glaucoma
fundus image dataset into three different classes. We compared the evaluation
results of TRk-CNN with Ranking-CNN (Rk-CNN) and multi-class CNN (MC-CNN) using
the DenseNet as the backbone CNN model. As a result, TRk-CNN achieved an
average accuracy of 92.96%, specificity of 93.33%, sensitivity for glaucoma
suspect of 95.12% and sensitivity for glaucoma of 93.98%. Based on average
accuracy, TRk-CNN is 8.04% and 9.54% higher than Rk-CNN and MC-CNN and
surprisingly 26.83% higher for sensitivity for suspicious than multi-class CNN.
Our TRk-CNN is expected to be effectively applied to the medical image
classification problem where the disease state is continuous and increases in
the positive class direction.Comment: 49 pages, 12 figure
BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
Age estimation is an important yet very challenging problem in computer
vision. Existing methods for age estimation usually apply a divide-and-conquer
strategy to deal with heterogeneous data caused by the non-stationary aging
process. However, the facial aging process is also a continuous process, and
the continuity relationship between different components has not been
effectively exploited. In this paper, we propose BridgeNet for age estimation,
which aims to mine the continuous relation between age labels effectively. The
proposed BridgeNet consists of local regressors and gating networks. Local
regressors partition the data space into multiple overlapping subspaces to
tackle heterogeneous data and gating networks learn continuity aware weights
for the results of local regressors by employing the proposed bridge-tree
structure, which introduces bridge connections into tree models to enforce the
similarity between neighbor nodes. Moreover, these two components of BridgeNet
can be jointly learned in an end-to-end way. We show experimental results on
the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet
outperforms the state-of-the-art methods.Comment: CVPR 201
Attended End-to-end Architecture for Age Estimation from Facial Expression Videos
The main challenges of age estimation from facial expression videos lie not
only in the modeling of the static facial appearance, but also in the capturing
of the temporal facial dynamics. Traditional techniques to this problem focus
on constructing handcrafted features to explore the discriminative information
contained in facial appearance and dynamics separately. This relies on
sophisticated feature-refinement and framework-design. In this paper, we
present an end-to-end architecture for age estimation, called Spatially-Indexed
Attention Model (SIAM), which is able to simultaneously learn both the
appearance and dynamics of age from raw videos of facial expressions.
Specifically, we employ convolutional neural networks to extract effective
latent appearance representations and feed them into recurrent networks to
model the temporal dynamics. More importantly, we propose to leverage attention
models for salience detection in both the spatial domain for each single image
and the temporal domain for the whole video as well. We design a specific
spatially-indexed attention mechanism among the convolutional layers to extract
the salient facial regions in each individual image, and a temporal attention
layer to assign attention weights to each frame. This two-pronged approach not
only improves the performance by allowing the model to focus on informative
frames and facial areas, but it also offers an interpretable correspondence
between the spatial facial regions as well as temporal frames, and the task of
age estimation. We demonstrate the strong performance of our model in
experiments on a large, gender-balanced database with 400 subjects with ages
spanning from 8 to 76 years. Experiments reveal that our model exhibits
significant superiority over the state-of-the-art methods given sufficient
training data.Comment: Accepted by Transactions on Image Processing (TIP
Geometric Image Correspondence Verification by Dense Pixel Matching
This paper addresses the problem of determining dense pixel correspondences
between two images and its application to geometric correspondence verification
in image retrieval. The main contribution is a geometric correspondence
verification approach for re-ranking a shortlist of retrieved database images
based on their dense pair-wise matching with the query image at a pixel level.
We determine a set of cyclically consistent dense pixel matches between the
pair of images and evaluate local similarity of matched pixels using neural
network based image descriptors. Final re-ranking is based on a novel
similarity function, which fuses the local similarity metric with a global
similarity metric and a geometric consistency measure computed for the matched
pixels. For dense matching our approach utilizes a modified version of a
recently proposed dense geometric correspondence network (DGC-Net), which we
also improve by optimizing the architecture. The proposed model and similarity
metric compare favourably to the state-of-the-art image retrieval methods. In
addition, we apply our method to the problem of long-term visual localization
demonstrating promising results and generalization across datasets.Comment: The appendix has been updated by adding some clarification
A Coupled Evolutionary Network for Age Estimation
Age estimation of unknown persons is a challenging pattern analysis task due
to the lacking of training data and various aging mechanisms for different
people. Label distribution learning-based methods usually make distribution
assumptions to simplify age estimation. However, age label distributions are
often complex and difficult to be modeled in a parameter way. Inspired by the
biological evolutionary mechanism, we propose a Coupled Evolutionary Network
(CEN) with two concurrent evolutionary processes: evolutionary label
distribution learning and evolutionary slack regression. Evolutionary network
learns and refines age label distributions in an iteratively learning way.
Evolutionary label distribution learning adaptively learns and constantly
refines the age label distributions without making strong assumptions on the
distribution patterns. To further utilize the ordered and continuous
information of age labels, we accordingly propose an evolutionary slack
regression to convert the discrete age label regression into the continuous age
interval regression. Experimental results on Morph, ChaLearn15 and
MegaAge-Asian datasets show the superiority of our method
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation
We address the challenging problem of RGB image-based head pose estimation.
We first reformulate head pose representation learning to constrain it to a
bounded space. Head pose represented as vector projection or vector angles
shows helpful to improving performance. Further, a ranking loss combined with
MSE regression loss is proposed. The ranking loss supervises a neural network
with paired samples of the same person and penalises incorrect ordering of pose
prediction. Analysis on this new loss function suggests it contributes to a
better local feature extractor, where features are generalised to Abstract
Landmarks which are pose-related features instead of pose-irrelevant
information such as identity, age, and lighting. Extensive experiments show
that our method significantly outperforms the current state-of-the-art schemes
on public datasets: AFLW2000 and BIWI. Our model achieves significant
improvements over previous SOTA MAE on AFLW2000 and BIWI from 4.50 to 3.66 and
from 4.0 to 3.71 respectively. Source code will be made available at:
https://github.com/seathiefwang/RankHeadPose
- …