3 research outputs found
Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition
Facial expression data is characterized by a significant imbalance, with most
collected data showing happy or neutral expressions and fewer instances of fear
or disgust. This imbalance poses challenges to facial expression recognition
(FER) models, hindering their ability to fully understand various human
emotional states. Existing FER methods typically report overall accuracy on
highly imbalanced test sets but exhibit low performance in terms of the mean
accuracy across all expression classes. In this paper, our aim is to address
the imbalanced FER problem. Existing methods primarily focus on learning
knowledge of minor classes solely from minor-class samples. However, we propose
a novel approach to extract extra knowledge related to the minor classes from
both major and minor class samples. Our motivation stems from the belief that
FER resembles a distribution learning task, wherein a sample may contain
information about multiple classes. For instance, a sample from the major class
surprise might also contain useful features of the minor class fear. Inspired
by that, we propose a novel method that leverages re-balanced attention maps to
regularize the model, enabling it to extract transformation invariant
information about the minor classes from all training samples. Additionally, we
introduce re-balanced smooth labels to regulate the cross-entropy loss, guiding
the model to pay more attention to the minor classes by utilizing the extra
information regarding the label distribution of the imbalanced training data.
Extensive experiments on different datasets and backbones show that the two
proposed modules work together to regularize the model and achieve
state-of-the-art performance under the imbalanced FER task. Code is available
at https://github.com/zyh-uaiaaaa.Comment: Accepted by NeurIPS202
Open-Set Facial Expression Recognition
Facial expression recognition (FER) models are typically trained on datasets
with a fixed number of seven basic classes. However, recent research works
point out that there are far more expressions than the basic ones. Thus, when
these models are deployed in the real world, they may encounter unknown
classes, such as compound expressions that cannot be classified into existing
basic classes. To address this issue, we propose the open-set FER task for the
first time. Though there are many existing open-set recognition methods, we
argue that they do not work well for open-set FER because FER data are all
human faces with very small inter-class distances, which makes the open-set
samples very similar to close-set samples. In this paper, we are the first to
transform the disadvantage of small inter-class distance into an advantage by
proposing a new way for open-set FER. Specifically, we find that small
inter-class distance allows for sparsely distributed pseudo labels of open-set
samples, which can be viewed as symmetric noisy labels. Based on this novel
observation, we convert the open-set FER to a noisy label detection problem. We
further propose a novel method that incorporates attention map consistency and
cycle training to detect the open-set samples. Extensive experiments on various
FER datasets demonstrate that our method clearly outperforms state-of-the-art
open-set recognition methods by large margins. Code is available at
https://github.com/zyh-uaiaaaa.Comment: Accepted by AAAI202
Faceptor: A Generalist Model for Face Perception
With the comprehensive research conducted on various face analysis tasks,
there is a growing interest among researchers to develop a unified approach to
face perception. Existing methods mainly discuss unified representation and
training, which lack task extensibility and application efficiency. To tackle
this issue, we focus on the unified model structure, exploring a face
generalist model. As an intuitive design, Naive Faceptor enables tasks with the
same output shape and granularity to share the structural design of the
standardized output head, achieving improved task extensibility. Furthermore,
Faceptor is proposed to adopt a well-designed single-encoder dual-decoder
architecture, allowing task-specific queries to represent new-coming semantics.
This design enhances the unification of model structure while improving
application efficiency in terms of storage overhead. Additionally, we introduce
Layer-Attention into Faceptor, enabling the model to adaptively select features
from optimal layers to perform the desired tasks. Through joint training on 13
face perception datasets, Faceptor achieves exceptional performance in facial
landmark localization, face parsing, age estimation, expression recognition,
binary attribute classification, and face recognition, achieving or surpassing
specialized methods in most tasks. Our training framework can also be applied
to auxiliary supervised learning, significantly improving performance in
data-sparse tasks such as age estimation and expression recognition. The code
and models will be made publicly available at
https://github.com/lxq1000/Faceptor