1,125 research outputs found
Improving Landmark Localization with Semi-Supervised Learning
We present two techniques to improve landmark localization in images from
partially annotated datasets. Our primary goal is to leverage the common
situation where precise landmark locations are only provided for a small data
subset, but where class labels for classification or regression tasks related
to the landmarks are more abundantly available. First, we propose the framework
of sequential multitasking and explore it here through an architecture for
landmark localization where training with class labels acts as an auxiliary
signal to guide the landmark localization on unlabeled data. A key aspect of
our approach is that errors can be backpropagated through a complete landmark
localization model. Second, we propose and explore an unsupervised learning
technique for landmark localization based on having a model predict equivariant
landmarks with respect to transformations applied to the image. We show that
these techniques, improve landmark prediction considerably and can learn
effective detectors even when only a small fraction of the dataset has landmark
labels. We present results on two toy datasets and four real datasets, with
hands and faces, and report new state-of-the-art on two datasets in the wild,
e.g. with only 5\% of labeled images we outperform previous state-of-the-art
trained on the AFLW dataset.Comment: Published as a conference paper in CVPR 201
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions
A person's face discloses important information about their affective state.
Although there has been extensive research on recognition of facial
expressions, the performance of existing approaches is challenged by facial
occlusions. Facial occlusions are often treated as noise and discarded in
recognition of affective states. However, hand over face occlusions can provide
additional information for recognition of some affective states such as
curiosity, frustration and boredom. One of the reasons that this problem has
not gained attention is the lack of naturalistic occluded faces that contain
hand over face occlusions as well as other types of occlusions. Traditional
approaches for obtaining affective data are time demanding and expensive, which
limits researchers in affective computing to work on small datasets. This
limitation affects the generalizability of models and deprives researchers from
taking advantage of recent advances in deep learning that have shown great
success in many fields but require large volumes of data. In this paper, we
first introduce a novel framework for synthesizing naturalistic facial
occlusions from an initial dataset of non-occluded faces and separate images of
hands, reducing the costly process of data collection and annotation. We then
propose a model for facial occlusion type recognition to differentiate between
hand over face occlusions and other types of occlusions such as scarves, hair,
glasses and objects. Finally, we present a model to localize hand over face
occlusions and identify the occluded regions of the face.Comment: Accepted to International Conference on Affective Computing and
Intelligent Interaction (ACII), 201
- …