Search CORE

6,354 research outputs found

When Facial Expression Recognition Meets Few-Shot Learning: A Joint and Alternate Learning Framework

Author: Chen S
Wang H
Xue J
Yan Y
Zou X
Publication venue: Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)
Publication date: 18/01/2022
Field of study

Human emotions involve basic and compound facial expres- sions. However, current research on facial expression recog- nition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER re- lies heavily on abundant labeled compound expression train- ing data, which are often laboriously collected under the pro- fessional instruction of psychology. In this paper, we study compound FER in the cross-domain few-shot learning set- ting, where only a few images of novel classes from the target domain are required as a reference. In particular, we aim to identify unseen compound expressions with the model trained on easily accessible basic expression datasets. To alleviate the problem of limited base classes in our FER task, we propose a novel Emotion Guided Similarity Network (EGS-Net), con- sisting of an emotion branch and a similarity branch, based on a two-stage learning framework. Specifically, in the first stage, the similarity branch is jointly trained with the emo- tion branch in a multi-task fashion. With the regularization of the emotion branch, we prevent the similarity branch from overfitting to sampled base classes that are highly overlapped across different episodes. In the second stage, the emotion branch and the similarity branch play a “two-student game” to alternately learn from each other, thereby further improving the inference ability of the similarity branch on unseen com- pound expressions. Experimental results on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method against several state- of-the-art methods

arXiv.org e-Print Archive

UCL Discovery

Fine-grained Few-shot Recognition by Deep Object Parsing

Author: Mishra Samarth
Saligrama Venkatesh
Zhu Pengkai
Zhu Ruizhao
Publication venue
Publication date: 14/07/2022
Field of study

In our framework, an object is made up of K distinct parts or units, and we parse a test instance by inferring the K parts, where each part occupies a distinct location in the feature space, and the instance features at this location, manifest as an active subset of part templates shared across all instances. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. We propose an end-to-end training method to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn multi-scale templates, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well

arXiv.org e-Print Archive

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

Author: Byun Hyeran
Fu Jianlong
Hong Kibeom
Jeon Seogkyu
Lee Pilhyeon
Liu Bei
Publication venue
Publication date: 21/08/2023
Field of study

Training deep generative models usually requires a large amount of data. To alleviate the data collection cost, the task of zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain without any further training samples. Due to the data absence, the textual description of the target domain and the vision-language models, e.g., CLIP, are utilized to effectively guide the generator. However, with only a single representative text feature instead of real images, the synthesized images gradually lose diversity as the model is optimized, which is also known as mode collapse. To tackle the problem, we propose a novel method to find semantic variations of the target text in the CLIP space. Specifically, we explore diverse semantic variations based on the informative text feature of the target domain while regularizing the uncontrolled deviation of the semantic information. With the obtained variations, we design a novel directional moment loss that matches the first and second moments of image and text direction distributions. Moreover, we introduce elastic weight consolidation and a relation consistency loss to effectively preserve valuable content information from the source domain, e.g., appearances. Through extensive experiments, we demonstrate the efficacy of the proposed methods in ensuring sample diversity in various scenarios of zero-shot GAN adaptation. We also conduct ablation studies to validate the effect of each proposed component. Notably, our model achieves a new state-of-the-art on zero-shot GAN adaptation in terms of both diversity and quality.Comment: Accepted to ICCV 2023 (poster

arXiv.org e-Print Archive

Revisiting the Kuleshov Effect with first-time viewers

Author: Ewing Louise
Ildirar Sermin
Publication venue: 'Berghahn Books'
Publication date: 01/01/2018
Field of study

Researchers have recently suggested that historically mixed findings in studies of the Kuleshov effect (a classic film editing–related phenomenon whereby meaning is extracted from the interaction of sequential camera shots) might reflect differences in the relative sophistication of early versus modern cinema audiences. Relative to experienced audiences, first-time film viewers might be less predisposed and/or able to forge the required conceptual and perceptual links between the edited shots in order to demonstrate the effect. This article recreates the conditions that traditionally elicit this effect (whereby a neutral face comes to be perceived as expressive after being juxtaposed with independent images: a bowl of soup, a gravestone, a child playing) to directly compare “continuity” perception in first-time and more experienced film viewers. Results confirm the presence of the Kuleshov effect for experienced viewers (explicitly only in the sadness condition) but not the first-time viewers, who failed to perceive continuity between the shots

Crossref

Birkbeck Institutional Research Online

University of East Anglia digital repository