228 research outputs found
Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings
Conventional feature-based and model-based gaze estimation methods have
proven to perform well in settings with controlled illumination and specialized
cameras. In unconstrained real-world settings, however, such methods are
surpassed by recent appearance-based methods due to difficulties in modeling
factors such as illumination changes and other visual artifacts. We present a
novel learning-based method for eye region landmark localization that enables
conventional methods to be competitive to latest appearance-based methods.
Despite having been trained exclusively on synthetic data, our method exceeds
the state of the art for iris localization and eye shape registration on
real-world imagery. We then use the detected landmarks as input to iterative
model-fitting and lightweight learning-based gaze estimation methods. Our
approach outperforms existing model-fitting and appearance-based methods in the
context of person-independent and personalized gaze estimation
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians
Convolutional neural nets (CNNs) have demonstrated remarkable performance in
recent history. Such approaches tend to work in a unidirectional bottom-up
feed-forward fashion. However, practical experience and biological evidence
tells us that feedback plays a crucial role, particularly for detailed spatial
understanding tasks. This work explores bidirectional architectures that also
reason with top-down feedback: neural units are influenced by both lower and
higher-level units.
We do so by treating units as rectified latent variables in a quadratic
energy function, which can be seen as a hierarchical Rectified Gaussian model
(RGs). We show that RGs can be optimized with a quadratic program (QP), that
can in turn be optimized with a recurrent neural network (with rectified linear
units). This allows RGs to be trained with GPU-optimized gradient descent. From
a theoretical perspective, RGs help establish a connection between CNNs and
hierarchical probabilistic models. From a practical perspective, RGs are well
suited for detailed spatial tasks that can benefit from top-down reasoning. We
illustrate them on the challenging task of keypoint localization under
occlusions, where local bottom-up evidence may be misleading. We demonstrate
state-of-the-art results on challenging benchmarks.Comment: To appear in CVPR 201
Facial Landmark Point Localization using Coarse-to-Fine Deep Recurrent Neural Network
The accurate localization of facial landmarks is at the core of face analysis
tasks, such as face recognition and facial expression analysis, to name a few.
In this work we propose a novel localization approach based on a Deep Learning
architecture that utilizes dual cascaded CNN subnetworks of the same length,
where each subnetwork in a cascade refines the accuracy of its predecessor. The
first set of cascaded subnetworks estimates heatmaps that encode the landmarks'
locations, while the second set of cascaded subnetworks refines the
heatmaps-based localization using regression, and also receives as input the
output of the corresponding heatmap estimation subnetwork. The proposed scheme
is experimentally shown to compare favorably with contemporary state-of-the-art
schemes
Structured Landmark Detection via Topology-Adapting Deep Graph Learning
Image landmark detection aims to automatically identify the locations of
predefined fiducial points. Despite recent success in this field,
higher-ordered structural modeling to capture implicit or explicit
relationships among anatomical landmarks has not been adequately exploited. In
this work, we present a new topology-adapting deep graph learning approach for
accurate anatomical facial and medical (e.g., hand, pelvis) landmark detection.
The proposed method constructs graph signals leveraging both local image
features and global shape features. The adaptive graph topology naturally
explores and lands on task-specific structures which are learned end-to-end
with two Graph Convolutional Networks (GCNs). Extensive experiments are
conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as
well as three real-world X-ray medical datasets (Cephalometric (public), Hand
and Pelvis). Quantitative results comparing with the previous state-of-the-art
approaches across all studied datasets indicating the superior performance in
both robustness and accuracy. Qualitative visualizations of the learned graph
topologies demonstrate a physically plausible connectivity laying behind the
landmarks.Comment: Accepted to ECCV-20. Camera-ready with supplementary materia
- …