6,354 research outputs found
When Facial Expression Recognition Meets Few-Shot Learning: A Joint and Alternate Learning Framework
Human emotions involve basic and compound facial expres- sions. However, current research on facial expression recog- nition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER re- lies heavily on abundant labeled compound expression train- ing data, which are often laboriously collected under the pro- fessional instruction of psychology. In this paper, we study compound FER in the cross-domain few-shot learning set- ting, where only a few images of novel classes from the target domain are required as a reference. In particular, we aim to identify unseen compound expressions with the model trained on easily accessible basic expression datasets. To alleviate the problem of limited base classes in our FER task, we propose a novel Emotion Guided Similarity Network (EGS-Net), con- sisting of an emotion branch and a similarity branch, based on a two-stage learning framework. Specifically, in the first stage, the similarity branch is jointly trained with the emo- tion branch in a multi-task fashion. With the regularization of the emotion branch, we prevent the similarity branch from overfitting to sampled base classes that are highly overlapped across different episodes. In the second stage, the emotion branch and the similarity branch play a “two-student game” to alternately learn from each other, thereby further improving the inference ability of the similarity branch on unseen com- pound expressions. Experimental results on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method against several state- of-the-art methods
Fine-grained Few-shot Recognition by Deep Object Parsing
In our framework, an object is made up of K distinct parts or units, and we
parse a test instance by inferring the K parts, where each part occupies a
distinct location in the feature space, and the instance features at this
location, manifest as an active subset of part templates shared across all
instances. We recognize test instances by comparing its active templates and
the relative geometry of its part locations against those of the presented
few-shot instances. We propose an end-to-end training method to learn part
templates on-top of a convolutional backbone. To combat visual distortions such
as orientation, pose and size, we learn multi-scale templates, and at test-time
parse and match instances across these scales. We show that our method is
competitive with the state-of-the-art, and by virtue of parsing enjoys
interpretability as well
Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations
Training deep generative models usually requires a large amount of data. To
alleviate the data collection cost, the task of zero-shot GAN adaptation aims
to reuse well-trained generators to synthesize images of an unseen target
domain without any further training samples. Due to the data absence, the
textual description of the target domain and the vision-language models, e.g.,
CLIP, are utilized to effectively guide the generator. However, with only a
single representative text feature instead of real images, the synthesized
images gradually lose diversity as the model is optimized, which is also known
as mode collapse. To tackle the problem, we propose a novel method to find
semantic variations of the target text in the CLIP space. Specifically, we
explore diverse semantic variations based on the informative text feature of
the target domain while regularizing the uncontrolled deviation of the semantic
information. With the obtained variations, we design a novel directional moment
loss that matches the first and second moments of image and text direction
distributions. Moreover, we introduce elastic weight consolidation and a
relation consistency loss to effectively preserve valuable content information
from the source domain, e.g., appearances. Through extensive experiments, we
demonstrate the efficacy of the proposed methods in ensuring sample diversity
in various scenarios of zero-shot GAN adaptation. We also conduct ablation
studies to validate the effect of each proposed component. Notably, our model
achieves a new state-of-the-art on zero-shot GAN adaptation in terms of both
diversity and quality.Comment: Accepted to ICCV 2023 (poster
Revisiting the Kuleshov Effect with first-time viewers
Researchers have recently suggested that historically mixed findings in studies of the Kuleshov effect (a classic film editing–related phenomenon whereby meaning is extracted from the interaction of sequential camera shots) might reflect differences in the relative sophistication of early versus modern cinema audiences. Relative to experienced audiences, first-time film viewers might be less predisposed and/or able to forge the required conceptual and perceptual links between the edited shots in order to demonstrate the effect. This article recreates the conditions that traditionally elicit this effect (whereby a neutral face comes to be perceived as expressive after being juxtaposed with independent images: a bowl of soup, a gravestone, a child playing) to directly compare “continuity” perception in first-time and more experienced film viewers. Results confirm the presence of the Kuleshov effect for experienced viewers (explicitly only in the sadness condition) but not the first-time viewers, who failed to perceive continuity between the shots
- …