419 research outputs found
Structured Landmark Detection via Topology-Adapting Deep Graph Learning
Image landmark detection aims to automatically identify the locations of
predefined fiducial points. Despite recent success in this field,
higher-ordered structural modeling to capture implicit or explicit
relationships among anatomical landmarks has not been adequately exploited. In
this work, we present a new topology-adapting deep graph learning approach for
accurate anatomical facial and medical (e.g., hand, pelvis) landmark detection.
The proposed method constructs graph signals leveraging both local image
features and global shape features. The adaptive graph topology naturally
explores and lands on task-specific structures which are learned end-to-end
with two Graph Convolutional Networks (GCNs). Extensive experiments are
conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as
well as three real-world X-ray medical datasets (Cephalometric (public), Hand
and Pelvis). Quantitative results comparing with the previous state-of-the-art
approaches across all studied datasets indicating the superior performance in
both robustness and accuracy. Qualitative visualizations of the learned graph
topologies demonstrate a physically plausible connectivity laying behind the
landmarks.Comment: Accepted to ECCV-20. Camera-ready with supplementary materia
Precise Facial Landmark Detection by Reference Heatmap Transformer
Most facial landmark detection methods predict landmarks by mapping the input
facial appearance features to landmark heatmaps and have achieved promising
results. However, when the face image is suffering from large poses, heavy
occlusions and complicated illuminations, they cannot learn discriminative
feature representations and effective facial shape constraints, nor can they
accurately predict the value of each element in the landmark heatmap, limiting
their detection accuracy. To address this problem, we propose a novel Reference
Heatmap Transformer (RHT) by introducing reference heatmap information for more
precise facial landmark detection. The proposed RHT consists of a Soft
Transformation Module (STM) and a Hard Transformation Module (HTM), which can
cooperate with each other to encourage the accurate transformation of the
reference heatmap information and facial shape constraints. Then, a Multi-Scale
Feature Fusion Module (MSFFM) is proposed to fuse the transformed heatmap
features and the semantic features learned from the original face images to
enhance feature representations for producing more accurate target heatmaps. To
the best of our knowledge, this is the first study to explore how to enhance
facial landmark detection by transforming the reference heatmap information.
The experimental results from challenging benchmark datasets demonstrate that
our proposed method outperforms the state-of-the-art methods in the literature.Comment: Accepted by IEEE Transactions on Image Processing, March 202
KPNet: Towards Minimal Face Detector
The small receptive field and capacity of minimal neural networks limit their
performance when using them to be the backbone of detectors. In this work, we
find that the appearance feature of a generic face is discriminative enough for
a tiny and shallow neural network to verify from the background. And the
essential barriers behind us are 1) the vague definition of the face bounding
box and 2) tricky design of anchor-boxes or receptive field. Unlike most
top-down methods for joint face detection and alignment, the proposed KPNet
detects small facial keypoints instead of the whole face by in a bottom-up
manner. It first predicts the facial landmarks from a low-resolution image via
the well-designed fine-grained scale approximation and scale adaptive
soft-argmax operator. Finally, the precise face bounding boxes, no matter how
we define it, can be inferred from the keypoints. Without any complex head
architecture or meticulous network designing, the KPNet achieves
state-of-the-art accuracy on generic face detection and alignment benchmarks
with only parameters, which runs at 1000fps on GPU and is easy to
perform real-time on most modern front-end chips.Comment: AAAI 202
- …