435 research outputs found
Semi-supervised Emotion Recognition using Inconsistently Annotated Data
International audienceExpression recognition remains challenging, predominantly due to (a) lack of sufficient data, (b) subtle emotion intensity, (c) subjective and inconsistent annotation, as well as due to (d) in-the-wild data containing variations in pose, intensity, and occlusion. To address such challenges in a unified framework, we propose a self-training based semi-supervised convolutional neural network (CNN) framework, which directly addresses the problem of (a) limited data by leveraging information from unannotated samples. Our method uses 'successive label smoothing' to adapt to the subtle expressions and improve the model performance for (b) low-intensity expression samples. Further, we address (c) inconsistent annotations by assigning sample weights during loss computation, thereby ignoring the effect of incorrect ground-truth. We observe significant performance improvement in in-the-wild datasets by leveraging the information from the in-the-lab datasets, related to challenge (d). Associated to that, experiments on four publicly available datasets demonstrate large performance gains in cross-database performance, as well as show that the proposed method achieves to learn different expression intensities, even when trained with categorical samples
LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise
Facial expression recognition (FER) remains a challenging task due to the
ambiguity of expressions. The derived noisy labels significantly harm the
performance in real-world scenarios. To address this issue, we present a new
FER model named Landmark-Aware Net~(LA-Net), which leverages facial landmarks
to mitigate the impact of label noise from two perspectives. Firstly, LA-Net
uses landmark information to suppress the uncertainty in expression space and
constructs the label distribution of each sample by neighborhood aggregation,
which in turn improves the quality of training supervision. Secondly, the model
incorporates landmark information into expression representations using the
devised expression-landmark contrastive loss. The enhanced expression feature
extractor can be less susceptible to label noise. Our method can be integrated
with any deep neural network for better training supervision without
introducing extra inference costs. We conduct extensive experiments on both
in-the-wild datasets and synthetic noisy datasets and demonstrate that LA-Net
achieves state-of-the-art performance.Comment: accepted by ICCV 202
Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks
Ensemble methods, traditionally built with independently trained
de-correlated models, have proven to be efficient methods for reducing the
remaining residual generalization error, which results in robust and accurate
methods for real-world applications. In the context of deep learning, however,
training an ensemble of deep networks is costly and generates high redundancy
which is inefficient. In this paper, we present experiments on Ensembles with
Shared Representations (ESRs) based on convolutional networks to demonstrate,
quantitatively and qualitatively, their data processing efficiency and
scalability to large-scale datasets of facial expressions. We show that
redundancy and computational load can be dramatically reduced by varying the
branching level of the ESR without loss of diversity and generalization power,
which are both important for ensemble performance. Experiments on large-scale
datasets suggest that ESRs reduce the remaining residual generalization error
on the AffectNet and FER+ datasets, reach human-level performance, and
outperform state-of-the-art methods on facial expression recognition in the
wild using emotion and affect concepts.Comment: Accepted at the Thirty-Fourth AAAI Conference on Artificial
Intelligence (AAAI-20), 1-1, New York, US
μΌκ΅΄ νμ μΈμ, λμ΄ λ° μ±λ³ μΆμ μ μν λ€μ€ λ°μ΄ν°μ λ€μ€ λλ©μΈ λ€μ€μμ λ€νΈμν¬
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :곡과λν μ κΈ°Β·μ 보곡νλΆ,2019. 8. Cho, Nam Ik.컨볼 루μ
λ΄λ΄ λ€νΈμν¬ (CNN)λ μΌκ΅΄κ³Ό κ΄λ ¨λ λ¬Έμ λ₯Ό ν¬ν¨νμ¬ λ§μ μ»΄ν¨ν° λΉμ μμ
μμ λ§€μ° μ μλν©λλ€. κ·Έλ¬λ μ°λ Ή μΆμ λ° μΌκ΅΄ νμ μΈμ (FER)μ κ²½μ° CNNμ΄ μ 곡 ν μ νλλ μ¬μ ν μ€μ λ¬Έμ μ λν΄ μΆ©λΆνμ§ μμ΅λλ€. CNNμ μΌκ΅΄μ μ£Όλ¦μ λκ»μ μμ λ―Έλ¬ν μ°¨μ΄λ₯Ό λ°κ²¬νμ§ λͺ»νμ§λ§,
μ΄κ²μ μ°λ Ή μΆμ κ³Ό FERμ νμμ μ
λλ€. λν μ€μ μΈκ³μμμ μΌκ΅΄ μ΄λ―Έμ§λ CNNμ΄ νλ ¨ λ°μ΄ν°μμ κ°λ₯ν λ νμ λ 물체λ₯Ό μ°Ύλ λ° κ°κ±΄νμ§ μμ νμ λ° μ‘°λͺ
μΌλ‘ μΈν΄ λ§μ μ°¨μ΄κ° μμ΅λλ€.
λν MTL (Multi Task Learning)μ μ¬λ¬ κ°μ§ μ§κ° μμ
μ λμμ ν¨μ¨μ μΌλ‘ μνν©λλ€. λͺ¨λ²μ μΈ MTL λ°©λ²μμλ μλ‘ λ€λ₯Έ μμ
μ λν λͺ¨λ λ μ΄λΈμ ν¨κ» ν¬ν¨νλ λ°μ΄ν° μ§ν©μ ꡬμ±νλ κ²μ κ³ λ €ν΄μΌν©λλ€. κ·Έλ¬λ λμ μμ
μ΄ λ€κ°νλκ³ λ³΅μ‘ν΄μ§λ©΄ λ κ°λ ₯ν λ μ΄λΈμ κ°μ§ κ³Όλνκ² ν° λ°μ΄ν° μΈνΈκ° νμν μ μμ΅λλ€. λ°λΌμ μνλ λΌλ²¨ λ°μ΄ν°λ₯Ό μμ±νλ λΉμ©μ μ’
μ’
μ₯μ λ¬Όμ΄λ©° νΉν λ€μ€ μμ
νμ΅μ κ²½μ° μ₯μ κ°λ©λλ€.
λ°λΌμ μ°λ¦¬λ κ°λ² νν°μ μΊ‘μ κΈ°λ° λ€νΈμν¬ (MTL) λ° λ°μ΄ν° μ¦λ₯λ₯Ό κΈ°λ°μΌλ‘νλ λ€μ€ μμ
νμ΅μ κΈ°λ°ν μλ‘μ΄ λ° κ°λ
νμ΅ λ°©λ²μ μ μνλ€.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face,
which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data.
Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning.
Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4
1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4
1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5
1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5
1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6
1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10
2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10
2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13
2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16
3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20
3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4. Experiments and Results 25
4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25
4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33
4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34
4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36
5. Conclusion 38
Abstract (In Korean) 49Maste
- β¦