Search CORE

435 research outputs found

Semi-supervised Emotion Recognition using Inconsistently Annotated Data

Author: Bremond Francois,
Dantcheva Antitza
Happy S,
Publication venue: HAL CCSD
Publication date: 16/11/2020
Field of study

International audienceExpression recognition remains challenging, predominantly due to (a) lack of sufficient data, (b) subtle emotion intensity, (c) subjective and inconsistent annotation, as well as due to (d) in-the-wild data containing variations in pose, intensity, and occlusion. To address such challenges in a unified framework, we propose a self-training based semi-supervised convolutional neural network (CNN) framework, which directly addresses the problem of (a) limited data by leveraging information from unannotated samples. Our method uses 'successive label smoothing' to adapt to the subtle expressions and improve the model performance for (b) low-intensity expression samples. Further, we address (c) inconsistent annotations by assigning sample weights during loss computation, thereby ignoring the effect of incorrect ground-truth. We observe significant performance improvement in in-the-wild datasets by leveraging the information from the in-the-lab datasets, related to challenge (d). Associated to that, experiments on four publicly available datasets demonstrate large performance gains in cross-database performance, as well as show that the proposed method achieves to learn different expression intensities, even when trained with categorical samples

INRIA a CCSD electronic archive server

LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise

Author: Cui Jinshi
Wu Zhiyu
Publication venue
Publication date: 20/07/2023
Field of study

Facial expression recognition (FER) remains a challenging task due to the ambiguity of expressions. The derived noisy labels significantly harm the performance in real-world scenarios. To address this issue, we present a new FER model named Landmark-Aware Net~(LA-Net), which leverages facial landmarks to mitigate the impact of label noise from two perspectives. Firstly, LA-Net uses landmark information to suppress the uncertainty in expression space and constructs the label distribution of each sample by neighborhood aggregation, which in turn improves the quality of training supervision. Secondly, the model incorporates landmark information into expression representations using the devised expression-landmark contrastive loss. The enhanced expression feature extractor can be less susceptible to label noise. Our method can be integrated with any deep neural network for better training supervision without introducing extra inference costs. We conduct extensive experiments on both in-the-wild datasets and synthetic noisy datasets and demonstrate that LA-Net achieves state-of-the-art performance.Comment: accepted by ICCV 202

arXiv.org e-Print Archive

Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks

Author: Magg Sven
Siqueira Henrique
Wermter Stefan
Publication venue
Publication date: 17/01/2020
Field of study

Ensemble methods, traditionally built with independently trained de-correlated models, have proven to be efficient methods for reducing the remaining residual generalization error, which results in robust and accurate methods for real-world applications. In the context of deep learning, however, training an ensemble of deep networks is costly and generates high redundancy which is inefficient. In this paper, we present experiments on Ensembles with Shared Representations (ESRs) based on convolutional networks to demonstrate, quantitatively and qualitatively, their data processing efficiency and scalability to large-scale datasets of facial expressions. We show that redundancy and computational load can be dramatically reduced by varying the branching level of the ESR without loss of diversity and generalization power, which are both important for ensemble performance. Experiments on large-scale datasets suggest that ESRs reduce the remaining residual generalization error on the AffectNet and FER+ datasets, reach human-level performance, and outperform state-of-the-art methods on facial expression recognition in the wild using emotion and affect concepts.Comment: Accepted at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 1-1, New York, US

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

얼굴 표정 인식, 나이 및 성별 추정을 위한 다중 데이터셋 다중 도메인 다중작업 네트워크

Author: Hosseini Sepidehsadat
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(석사)--서울대학교 대학원 :공과대학 전기·정보공학부,2019. 8. Cho, Nam Ik.컨볼 루션 뉴럴 네트워크 (CNN)는 얼굴과 관련된 문제를 포함하여 많은 컴퓨터 비전 작업에서 매우 잘 작동합니다. 그러나 연령 추정 및 얼굴 표정 인식 (FER)의 경우 CNN이 제공 한 정확도는 여전히 실제 문제에 대해 충분하지 않습니다. CNN은 얼굴의 주름의 두께와 양의 미묘한 차이를 발견하지 못했지만, 이것은 연령 추정과 FER에 필수적입니다. 또한 실제 세계에서의 얼굴 이미지는 CNN이 훈련 데이터에서 가능할 때 회전 된 물체를 찾는 데 강건하지 않은 회전 및 조명으로 인해 많은 차이가 있습니다. 또한 MTL (Multi Task Learning)은 여러 가지 지각 작업을 동시에 효율적으로 수행합니다. 모범적 인 MTL 방법에서는 서로 다른 작업에 대한 모든 레이블을 함께 포함하는 데이터 집합을 구성하는 것을 고려해야합니다. 그러나 대상 작업이 다각화되고 복잡해지면 더 강력한 레이블을 가진 과도하게 큰 데이터 세트가 필요할 수 있습니다. 따라서 원하는 라벨 데이터를 생성하는 비용은 종종 장애물이며 특히 다중 작업 학습의 경우 장애가됩니다. 따라서 우리는 가버 필터와 캡슐 기반 네트워크 (MTL) 및 데이터 증류를 기반으로하는 다중 작업 학습에 기반한 새로운 반 감독 학습 방법을 제안한다.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face, which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data. Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning. Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4 1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4 1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5 1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5 1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6 1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6 1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10 2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10 2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13 2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16 3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20 3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4. Experiments and Results 25 4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25 4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33 4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36 5. Conclusion 38 Abstract (In Korean) 49Maste

SNU Open Repository and Archive