15 research outputs found
3D-CNN for Facial Micro- and Macro-expression Spotting on Long Video Sequences using Temporal Oriented Reference Frame
Facial expression spotting is the preliminary step for micro- and
macro-expression analysis. The task of reliably spotting such expressions in
video sequences is currently unsolved. The current best systems depend upon
optical flow methods to extract regional motion features, before categorisation
of that motion into a specific class of facial movement. Optical flow is
susceptible to drift error, which introduces a serious problem for motions with
long-term dependencies, such as high frame-rate macro-expression. We propose a
purely deep learning solution which, rather than track frame differential
motion, compares via a convolutional model, each frame with two temporally
local reference frames. Reference frames are sampled according to calculated
micro- and macro-expression durations. We show that our solution achieves
state-of-the-art performance (F1-score of 0.126) in a dataset of high
frame-rate (200 fps) long video sequences (SAMM-LV) and is competitive in a low
frame-rate (30 fps) dataset (CAS(ME)2). In this paper, we document our deep
learning model and parameters, including how we use local contrast
normalisation, which we show is critical for optimal results. We surpass a
limitation in existing methods, and advance the state of deep learning in the
domain of facial expression spotting
AnchorFace: An Anchor-based Facial Landmark Detector Across Large Poses
Facial landmark localization aims to detect the predefined points of human
faces, and the topic has been rapidly improved with the recent development of
neural network based methods. However, it remains a challenging task when
dealing with faces in unconstrained scenarios, especially with large pose
variations. In this paper, we target the problem of facial landmark
localization across large poses and address this task based on a
split-and-aggregate strategy. To split the search space, we propose a set of
anchor templates as references for regression, which well addresses the large
variations of face poses. Based on the prediction of each anchor template, we
propose to aggregate the results, which can reduce the landmark uncertainty due
to the large poses. Overall, our proposed approach, named AnchorFace, obtains
state-of-the-art results with extremely efficient inference speed on four
challenging benchmarks, i.e. AFLW, 300W, Menpo, and WFLW dataset. Code will be
available at https://github.com/nothingelse92/AnchorFace.Comment: To appear in AAAI 202