34 research outputs found

    CAKE: Compact and Accurate K-dimensional representation of Emotion

    Get PDF
    Numerous models describing the human emotional states have been built by the psychology community. Alongside, Deep Neural Networks (DNN) are reaching excellent performances and are becoming interesting features extraction tools in many computer vision tasks.Inspired by works from the psychology community, we first study the link between the compact two-dimensional representation of the emotion known as arousal-valence, and discrete emotion classes (e.g. anger, happiness, sadness, etc.) used in the computer vision community. It enables to assess the benefits -- in terms of discrete emotion inference -- of adding an extra dimension to arousal-valence (usually named dominance). Building on these observations, we propose CAKE, a 3-dimensional representation of emotion learned in a multi-domain fashion, achieving accurate emotion recognition on several public datasets. Moreover, we visualize how emotions boundaries are organized inside DNN representations and show that DNNs are implicitly learning arousal-valence-like descriptions of emotions. Finally, we use the CAKE representation to compare the quality of the annotations of different public datasets

    Towards a General Model of Knowledge for Facial Analysis by Multi-Source Transfer Learning

    Get PDF
    This paper proposes a step toward obtaining general models of knowledge for facial analysis, by addressing the question of multi-source transfer learning. More precisely, the proposed approach consists in two successive training steps: the first one consists in applying a combination operator to define a common embedding for the multiple sources materialized by different existing trained models. The proposed operator relies on an auto-encoder, trained on a large dataset, efficient both in terms of compression ratio and transfer learning performance. In a second step we exploit a distillation approach to obtain a lightweight student model mimicking the collection of the fused existing models. This model outperforms its teacher on novel tasks, achieving results on par with state-of-the-art methods on 15 facial analysis tasks (and domains), at an affordable training cost. Moreover, this student has 75 times less parameters than the original teacher and can be applied to a variety of novel face-related tasks

    Linear kernel combination using boosting

    Get PDF
    International audienceIn this paper, we propose a novel algorithm to design multi- class kernels based on an iterative combination of weak kernels in a schema inspired from the boosting framework. Our solution has a complexity lin- ear with the training set size. We evaluate our method for classification on a toy example by integrating our multi-class kernel into a kNN clas- sifier and comparing our results with a reference iterative kernel design method. We also evaluate our method for image categorization by con- sidering a classic image database and comparing our boosted linear kernel combination with the direct linear combination of all features in a linear SVM

    CAKE: Compact and Accurate K-dimensional representation of Emotion

    Get PDF
    International audienceNumerous models describing the human emotional states have been built by the psychology community. Alongside, Deep Neural Networks (DNN) are reaching excellent performances and are becoming interesting features extraction tools in many computer vision tasks.Inspired by works from the psychology community, we first study the link between the compact two-dimensional representation of the emotion known as arousal-valence, and discrete emotion classes (e.g. anger, happiness, sadness, etc.) used in the computer vision community. It enables to assess the benefits -- in terms of discrete emotion inference -- of adding an extra dimension to arousal-valence (usually named dominance). Building on these observations, we propose CAKE, a 3-dimensional representation of emotion learned in a multi-domain fashion, achieving accurate emotion recognition on several public datasets. Moreover, we visualize how emotions boundaries are organized inside DNN representations and show that DNNs are implicitly learning arousal-valence-like descriptions of emotions. Finally, we use the CAKE representation to compare the quality of the annotations of different public datasets

    Structure-Preserving Transformers for Sequences of SPD Matrices

    Full text link
    In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.Comment: Submitted to the ICASSP 2024 Conference. v2: error correction relative to v1 - Section 1, changed "less anisotropic" to "less isotropic". v3: updated citation 15 (has since been published

    An Occam’s Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

    Get PDF
    International audienceThis paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest earning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The isual temporal information is handled by a simple score-per-frame selection process, averaged across time. iii) A simple frame selection echanism is also proposed to weight the images of a sequence. iv) The fusion of the different modalities is performed at prediction level (late usion). We also highlight the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation equences. The proposed real-time emotion classifier achieved a state-of-the-art accuracy of 60.64 % on the test set of AFEW, and ranked 4th at he Emotion in the Wild 2018 challenge

    MLBoost Revisited: A Faster Metric Learning Algorithm for Identity-Based Face Retrieval

    Full text link
    International audienceThis paper addresses the question of metric learning, i.e. the learning of a dissimilar-ity function from a set of similar/dissimilar example pairs. This domain plays an important role in many machine learning applications such as those related to face recognition or face retrieval. More specifically, this paper builds on the recent MLBoost method proposed by Negrel et al. [25]. MLBoost has been shown to perform very well for face retrieval tasks, but this algorithm relies on the computation of a weak metric which is very time consuming. This paper demonstrates how, by introducing sparsity into the weak projectors, the convergence time can be reduced up to a factor of 10× compared to MLBoost, without any performance loss. The paper also introduces an explicit way to control the rank of the so-obtained metrics, allowing to fix in advance the dimension of the (projected) feature space. The proposed ideas are experimentally validated on a face retrieval task with three different signatures

    Active Boosting for Interactive Object Retrieval

    Full text link
    International audienceThis paper presents a new algorithm based on boost- ing for interactive object retrieval in images. Recent works propose ”online boosting” algorithms where weak classifier sets are iteratively trained from data. These algorithms are proposed for visual tracking in videos, and are not well adapted to ”online boosting” for interactive retrieval. We propose in this paper to iteratively build weak classifiers from images, labeled as positive by the user during a retrieval session. A novel active learning strategy for the selection of im- ages for user annotation is also proposed. This strategy is used to enhance the strong classifier resulting from ”boosting” process, but also to build new weak classi- fiers. Experiments have been carried out on a generalist database in order to compare the proposed method to a SVM based reference approach

    Interactive and multi-class Learning to detect semantic concepts in the multimedia data

    No full text
    Récemment les techniques d'apprentissage automatique ont montré leurs capacité à identifier des catégories d'images à partir de descripteurs extrait de caractéristiques visuels des images. Face à la croissance du nombre d'images et du nombre de catégories à traiter, plusieurs techniques ont été proposées pour réduire à la fois le coût calculatoire des méthodes et l'investissement humain en terme de supervision. Dans cette thèse nous proposons deux méthodes qui ont pour objectif de traiter un grand nombre d'images et de catégories. Nous proposons tout d'abord une solution reposant sur le concepts de recherche interactive. Le protocole de recherche interactive propose d'établir un « dialogue » entre le système d'apprentissage et l'utilisateur afin de minimiser l'effort d'annotation. Nous avons voulu dans ces travaux proposer une solution de recherche interactive adaptée aux méthodes de boosting . Ces méthodes combinent des classifieurs faibles pour produire un classifieur plus fort. Nous avons proposé une méthode de boosting interactif pour la recherche dans les images qui fit l'objet de deux articles (RFIA 2010, ICPR 2010). Ces méthodes proposent notamment une nouvelle manière de construire l'ensemble des classifieurs faibles sélectionnables par le boosting. Dans un second temps nous nous sommes consacré plus particulièrement aux méthodes à noyaux dans un contexte d'apprentissage plus classique. Ces méthodes ont montré de très bon résultats mais le choix de la fonction noyau et son réglage reste un enjeux important. Dans ces travaux, nous avons mis en place une nouvelle méthode d'apprentissage de fonction noyau multi-classes pour la classification de grande base d'images. Nous avons choisie d'utiliser un frameworks inspiré des méthodes de boosting pour créer un noyau fort à partir d'une combinaison de noyau plus faible. Nous utilisons la dualité entre fonction noyau et espace induit pour construit un nouvelle espace de représentation des données plus adapté à la catégorisation. L'idée de notre méthode est de construire de manière optimale ce nouvel espace de représentation afin qu'il permette l'apprentissage d'un nouveau classifieur plus rapide et de meilleures qualités. Chaque donnée multimédia sera alors représentée dans cette espace sémantique en lieu et place de sa représentation visuelle. Pour reproduire une approche similaire à une méthode de boosting, nous utilisons une construction incrémentale où des noyaux faibles sont entraînés dans une direction déterminée par les erreurs de l'itération précédente. Ces noyaux sont combinés à un facteur de pondération près, calculé grâce à la résolution analytique d'un problème d'optimisation. Ces travaux se basent sur des fondements mathématiques et font l'objet d'expériences montrant son intérêt pratique par comparaison avec les méthodes les plus récentes de la littérature. Ceux-ci sont présentés dans deux articles à Esann 2012 et ICIP 2012 ainsi que dans une soumission à MTAP.Recent machine learning techniques have demonstrated their capability for identifying image categories using image features. Among these techniques, Support Vector Machines (SVM)present the best results, particularly when they are associated with a kernel function. However, nowadays image categorization task is very challenging owing to the sizes of benchmark datasets and the number of categories to be classified. In such a context, lot of effort has to be put in the design of the kernel functions and underlying high-level features. In this talk, we propose a new method to learn a kernel function for image categorization in large image databases. Our learning method is made of two steps :first, a kernel is built and semantic features are deduced ; then each class is learn thanks to a standard SVM. We adopt a Boosting framework to design and combine weak kernel functions targeting an ideal kernel. We propose a new iterative algorithm inspired from Boosting, to create a strong kernel. The weak kernels are learn thanks to the duality between the kernel space and the semantic feature space. We show that our method actually builds mapping functions which turn the initial input space to a new feature space where categories are better classified. Furthermore, our algorithm benefits from Boosting process to learn this kernel with a complexity linear with the size of the training set. Experiments are carried out on popular benchmarks and databases to show the properties and behavior of the proposed method. On the PASCAL VOC2006 database, we compare our method to simple early fusion, and on the Oxford Flowers databases we show that our method outperforms the best MKL techniques of the literature
    corecore