Micro-expressions are spontaneous, rapid and subtle facial movements that can
neither be forged nor suppressed. They are very important nonverbal
communication clues, but are transient and of low intensity thus difficult to
recognize. Recently deep learning based methods have been developed for
micro-expression (ME) recognition using feature extraction and fusion
techniques, however, targeted feature learning and efficient feature fusion
still lack further study according to the ME characteristics. To address these
issues, we propose a novel framework Feature Representation Learning with
adaptive Displacement Generation and Transformer fusion (FRL-DGT), in which a
convolutional Displacement Generation Module (DGM) with self-supervised
learning is used to extract dynamic features from onset/apex frames targeted to
the subsequent ME recognition task, and a well-designed Transformer Fusion
mechanism composed of three Transformer-based fusion modules (local, global
fusions based on AU regions and full-face fusion) is applied to extract the
multi-level informative features after DGM for the final ME prediction. The
extensive experiments with solid leave-one-subject-out (LOSO) evaluation
results have demonstrated the superiority of our proposed FRL-DGT to
state-of-the-art methods