435 research outputs found

    Semi-supervised Emotion Recognition using Inconsistently Annotated Data

    Get PDF
    International audienceExpression recognition remains challenging, predominantly due to (a) lack of sufficient data, (b) subtle emotion intensity, (c) subjective and inconsistent annotation, as well as due to (d) in-the-wild data containing variations in pose, intensity, and occlusion. To address such challenges in a unified framework, we propose a self-training based semi-supervised convolutional neural network (CNN) framework, which directly addresses the problem of (a) limited data by leveraging information from unannotated samples. Our method uses 'successive label smoothing' to adapt to the subtle expressions and improve the model performance for (b) low-intensity expression samples. Further, we address (c) inconsistent annotations by assigning sample weights during loss computation, thereby ignoring the effect of incorrect ground-truth. We observe significant performance improvement in in-the-wild datasets by leveraging the information from the in-the-lab datasets, related to challenge (d). Associated to that, experiments on four publicly available datasets demonstrate large performance gains in cross-database performance, as well as show that the proposed method achieves to learn different expression intensities, even when trained with categorical samples

    LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise

    Full text link
    Facial expression recognition (FER) remains a challenging task due to the ambiguity of expressions. The derived noisy labels significantly harm the performance in real-world scenarios. To address this issue, we present a new FER model named Landmark-Aware Net~(LA-Net), which leverages facial landmarks to mitigate the impact of label noise from two perspectives. Firstly, LA-Net uses landmark information to suppress the uncertainty in expression space and constructs the label distribution of each sample by neighborhood aggregation, which in turn improves the quality of training supervision. Secondly, the model incorporates landmark information into expression representations using the devised expression-landmark contrastive loss. The enhanced expression feature extractor can be less susceptible to label noise. Our method can be integrated with any deep neural network for better training supervision without introducing extra inference costs. We conduct extensive experiments on both in-the-wild datasets and synthetic noisy datasets and demonstrate that LA-Net achieves state-of-the-art performance.Comment: accepted by ICCV 202

    Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks

    Full text link
    Ensemble methods, traditionally built with independently trained de-correlated models, have proven to be efficient methods for reducing the remaining residual generalization error, which results in robust and accurate methods for real-world applications. In the context of deep learning, however, training an ensemble of deep networks is costly and generates high redundancy which is inefficient. In this paper, we present experiments on Ensembles with Shared Representations (ESRs) based on convolutional networks to demonstrate, quantitatively and qualitatively, their data processing efficiency and scalability to large-scale datasets of facial expressions. We show that redundancy and computational load can be dramatically reduced by varying the branching level of the ESR without loss of diversity and generalization power, which are both important for ensemble performance. Experiments on large-scale datasets suggest that ESRs reduce the remaining residual generalization error on the AffectNet and FER+ datasets, reach human-level performance, and outperform state-of-the-art methods on facial expression recognition in the wild using emotion and affect concepts.Comment: Accepted at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 1-1, New York, US

    μ–Όκ΅΄ ν‘œμ • 인식, λ‚˜μ΄ 및 성별 좔정을 μœ„ν•œ 닀쀑 데이터셋 닀쀑 도메인 λ‹€μ€‘μž‘μ—… λ„€νŠΈμ›Œν¬

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀,2019. 8. Cho, Nam Ik.컨볼 λ£¨μ…˜ λ‰΄λŸ΄ λ„€νŠΈμ›Œν¬ (CNN)λŠ” μ–Όκ΅΄κ³Ό κ΄€λ ¨λœ 문제λ₯Ό ν¬ν•¨ν•˜μ—¬ λ§Žμ€ 컴퓨터 λΉ„μ „ μž‘μ—…μ—μ„œ 맀우 잘 μž‘λ™ν•©λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ μ—°λ Ή μΆ”μ • 및 μ–Όκ΅΄ ν‘œμ • 인식 (FER)의 경우 CNN이 제곡 ν•œ μ •ν™•λ„λŠ” μ—¬μ „νžˆ μ‹€μ œ λ¬Έμ œμ— λŒ€ν•΄ μΆ©λΆ„ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. CNN은 μ–Όκ΅΄μ˜ μ£Όλ¦„μ˜ λ‘κ»˜μ™€ μ–‘μ˜ λ―Έλ¬˜ν•œ 차이λ₯Ό λ°œκ²¬ν•˜μ§€ λͺ»ν–ˆμ§€λ§Œ, 이것은 μ—°λ Ή μΆ”μ •κ³Ό FER에 ν•„μˆ˜μ μž…λ‹ˆλ‹€. λ˜ν•œ μ‹€μ œ μ„Έκ³„μ—μ„œμ˜ μ–Όκ΅΄ μ΄λ―Έμ§€λŠ” CNN이 ν›ˆλ ¨ λ°μ΄ν„°μ—μ„œ κ°€λŠ₯ν•  λ•Œ νšŒμ „ 된 물체λ₯Ό μ°ΎλŠ” 데 κ°•κ±΄ν•˜μ§€ μ•Šμ€ νšŒμ „ 및 μ‘°λͺ…μœΌλ‘œ 인해 λ§Žμ€ 차이가 μžˆμŠ΅λ‹ˆλ‹€. λ˜ν•œ MTL (Multi Task Learning)은 μ—¬λŸ¬ 가지 지각 μž‘μ—…μ„ λ™μ‹œμ— 효율적으둜 μˆ˜ν–‰ν•©λ‹ˆλ‹€. λͺ¨λ²”적 인 MTL λ°©λ²•μ—μ„œλŠ” μ„œλ‘œ λ‹€λ₯Έ μž‘μ—…μ— λŒ€ν•œ λͺ¨λ“  λ ˆμ΄λΈ”μ„ ν•¨κ»˜ ν¬ν•¨ν•˜λŠ” 데이터 집합을 κ΅¬μ„±ν•˜λŠ” 것을 κ³ λ €ν•΄μ•Όν•©λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ λŒ€μƒ μž‘μ—…μ΄ λ‹€κ°ν™”λ˜κ³  λ³΅μž‘ν•΄μ§€λ©΄ 더 κ°•λ ₯ν•œ λ ˆμ΄λΈ”μ„ 가진 κ³Όλ„ν•˜κ²Œ 큰 데이터 μ„ΈνŠΈκ°€ ν•„μš”ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ”°λΌμ„œ μ›ν•˜λŠ” 라벨 데이터λ₯Ό μƒμ„±ν•˜λŠ” λΉ„μš©μ€ μ’…μ’… μž₯애물이며 특히 닀쀑 μž‘μ—… ν•™μŠ΅μ˜ 경우 μž₯μ• κ°€λ©λ‹ˆλ‹€. λ”°λΌμ„œ μš°λ¦¬λŠ” 가버 필터와 캑슐 기반 λ„€νŠΈμ›Œν¬ (MTL) 및 데이터 증λ₯˜λ₯Ό κΈ°λ°˜μœΌλ‘œν•˜λŠ” 닀쀑 μž‘μ—… ν•™μŠ΅μ— κΈ°λ°˜ν•œ μƒˆλ‘œμš΄ 반 감독 ν•™μŠ΅ 방법을 μ œμ•ˆν•œλ‹€.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face, which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data. Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning. Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4 1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4 1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5 1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5 1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6 1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6 1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10 2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10 2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13 2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16 3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20 3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4. Experiments and Results 25 4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25 4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33 4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36 5. Conclusion 38 Abstract (In Korean) 49Maste
    • …
    corecore