6 research outputs found

    Deep representation learning for automatic depression detection from facial expressions

    No full text
    Abstract Depression is a prevalent mental disorder that severely affects an individual’s quality of life. Traditional diagnostic methods rely on either clinician’s evaluation of symptoms reported by an individual or self-report instruments. These subjective assessments have resulted in difficulties to recognize depression. This scenario has motivated the development of automatic diagnostic systems to provide objective and reliable information about depressive states. Recently, a growing interest has been generated in developing such systems based on facial information since there exists evidence that facial expressions convey valuable information about depression. This thesis proposes computational models to explore the correlations between facial expressions and depressive states. Such exploration is a challenging task because 1) the difference in facial expressions along different depression levels may be small and 2) the complexities involved in facial analysis. From this perspective, we investigate different deep learning techniques to effectively model facial expressions for automatic depression detection. Specifically, we design architectures that model the appearance and dynamics of facial videos. For that, we analyze structures that explore either a fixed or multiple spatiotemporal ranges. Our findings suggest that the use of a structure with multiscale feature extraction ability contributes to learning depression representation. We also demonstrate that depression distributions increase the robustness of depression estimations. Another key challenge in this application is the scarcity of labelled data. This limitation leads to the need of efficient representation learning methods. To this end, we first develop a pooling method to encode facial dynamics into an image map, which may be explored by less complex deep models. In addition, we design an architecture to capture different facial expression variations by using a basic structure based on functions that explore features at multiple ranges without using trainable parameters. Finally, we develop an architecture to explore facial expressions related to depression and pain since depressed individuals may experience pain. To build this architecture, we use different strategies to efficiently extract multiscale features. Our experiments indicate that the proposed methods have the potential to generate discriminative representations.Tiivistelmä Masennus on yleinen mielenterveyden häiriö, joka heikentää merkittävästi yksilön elämänlaatua. Perinteiset diagnostiset menetelmät nojaavat joko kliinikon arvioon oireista potilaan kertomuksen perusteella tai itsearvioihin. Subjektiivinen arviointi on johtanut vaikeuksiin tunnistaa masennusta. Tämä motivoi kehittämään automaattisia diagnostiikkajärjestelmiä tarjoamaan objektiivista ja luotettavaa tietoa masennustiloista. Viime aikoina kiinnostus hyödyntää kasvoista saatavaa informaatiota kyseisissä järjestelmissä on noussut, sillä on pystytty osoittamaan ilmeiden välittävän arvokasta tietoa masennuksesta. Tässä väitöskirjassa esitetään laskennallisia malleja tutkimaan korrelaatiota ilmeiden ja masennustilojen välillä. Tehtävä on haastava, sillä: 1) ilmeiden ja masennuksen eri tasojen väliset erot saattavat olla pieniä ja 2) kasvoanalyysiin liittyy monimutkaisuuksia. Tästä näkökulmasta tutkitaan erilaisia syväoppimistekniikoita mallintamaan tehokkaasti ilmeitä masennuksen automaattisessa tunnistuksessa. Erityisesti suunnitellaan arkkitehtuureja, jotka mallintavat kasvoja ja niiden dynamiikkaa videoista. Tätä varten analysoidaan rakenteita, jotka tutkivat kiinteää spatiotemporaalista aluetta sekä spatiotemporaalista moniskaalainformaatiota. Havaintojen pohjalta spatiotemporaalisten moniskaalarakenteiden käyttö parantaa piirteiden irrotuskykyä masennuksen esitystavan oppimisessa. Masennusjakaumien antama lisä masennusestimaattien luotettavuudessa osoitetaan. Toinen tärkeä sovellushaaste on luokitellun datan niukkuus, mistä seuraa tarve oppia tehokkaita esitystapoja. Tätä varten aluksi kehitetään yhdistämismenetelmä kasvojen dynamiikan koodaamiseksi kuvakartalle, jota voidaan tutkia laskennallisesti kevyillä syväoppimismenetelmillä. Lisäksi suunnitellaan arkkitehtuuri, joka rekisteröi eri ilmeiden vaihtelua. Perusteena ovat funktiot, jotka tutkivat piirteitä useilla arvoalueilla ilman opittavia parametreja. Lopuksi kehitetään arkkitehtuuri tutkimaan masennukseen ja kivun tunteeseen liittyviä ilmeitä, sillä masentuneet ihmiset saattavat kokea kipua. Arkkitehtuurin rakentamisessa käytetään erilaisia strategioita spatiotemporaalisten moniskaalapiirteiden irrottamiseen. Laajat kokeet osoittavat, että esitetyillä menetelmillä on potentiaalia luoda erottelukykyisiä esitystapoja

    A deep multiscale spatiotemporal network for assessing depression from facial dynamics

    No full text
    Abstract Recently, deep learning models have been successfully employed in video-based affective computing applications. One key application is automatic depression recognition from facial expressions. State-of-the-art approaches to recognize depression typically explore spatial and temporal information individually, by using convolutional neural networks (CNNs) to analyze appearance information, and then by either mapping feature variations or averaging the depression level over video frames. This approach has limitations to represent dynamic information that can help to discriminate between depression levels. In contrast, 3D CNN-based models can directly encode the spatio-temporal relationships, although these models rely on fixed-range temporal information and single receptive field. This approach limits the ability to capture facial expression variations with diverse ranges, and the exploitation of diverse facial areas. In this paper, a novel 3D CNN architecture the Multiscale Spatiotemporal Network (MSN) is introduced to effectively represent facial information related to depressive behaviours. The basic structure of the model is composed of parallel convolutional layers with different temporal depths and sizes of receptive field, which allows the MSN to explore a wide range of spatio-temporal variations in facial expressions. Experimental results on two benchmark datasets show that our MSN is effective, outperforming state-of-the-art methods in automatic depression recognition

    Depression detection based on deep distribution learning

    No full text
    Abstract Major depressive disorder is among the most common and harmful mental health problems. Several deep learning architectures have been proposed for video-based detection of depression based on the facial expressions of subjects. To predict the depression level, these architectures are often modeled for regression with Euclidean loss. Consequently, they do not leverage the data distribution, nor explore the ordinal relationship between facial images and depression levels, and have limited robustness to noisy and uncertain labeling. This paper introduces a deep learning architecture for accurately predicting depression levels through distribution learning. It relies on a new expectation loss function that allows to estimate the underlying data distribution over depression levels, where expected values of the distribution are optimized to approach the ground-truth levels. The proposed approach can produce accurate predictions of depression levels even under label uncertainty. Extensive experiments on the AVEC2013 and AVEC2014 datasets indicate that the proposed architecture represents an effective approach that can outperform state-of-the-art techniques

    Combining global and local convolutional 3D networks for detecting depression from facial expressions

    No full text
    Abstract Deep learning architectures have been successfully applied in video-based health monitoring, to recognize distinctive variations in the facial appearance of subjects. To detect patterns of variation linked to depressive behavior, deep neural networks (NNs) typically exploit spatial and temporal information separately by, e.g., cascading a 2D convolutional NN (CNN) with a recurrent NN (RNN), although the intrinsic spatio-temporal relationships can deteriorate. With the recent advent of 3D CNNs like the convolutional 3D (C3D) network, these spatio-temporal relationships can be modeled to improve performance. However, the accuracy of C3D networks remain an issue when applied to depression detection. In this paper, the fusion of diverse C3D predictions are proposed to improve accuracy, where spatio-temporal features are extracted from global (full-face) and local (eyes) regions of subject. This allows to increasingly focus on a local facial region that is highly relevant for analyzing depression. Additionally, the proposed network integrates 3D Global Average Pooling in order to efficiently summarize spatio-temporal features without using fully-connected layers, and thereby reduce the number of model parameters and potential over-fitting. Experimental results on the Audio Visual Emotion Challenge (AVEC 2013 and AVEC 2014) depression datasets indicates that combining the responses of global and local C3D networks achieves a higher level of accuracy than state-of-the-art systems

    MDN:a deep maximization-differentiation network for spatio-temporal depression detection

    No full text
    Abstract Deep learning (DL) models have been successfully applied in video-based affective computing, allowing to recognize emotions and mood, or to estimate the intensity of pain or stress based on facial expressions. Despite the advances with state-of-the-art DL models for spatio-temporal recognition of facial expressions associated with depression, some challenges remain in the cost-effective application of 3D-CNNs: (1) 3D convolutions employ structures with fixed temporal depth that decreases the potential to extract discriminative representations due to the usually small difference of spatio-temporal variations along different depression levels; and (2) the computationally complexity of these models with consequent susceptibility to overfitting. To address these challenges, we propose a novel DL architecture called the Maximization and Differentiation Network (MDN) in order to effectively represent facial expression variations that are relevant for depression assessment. The MDN operates without 3D convolutions, by exploring multiscale temporal information using a maximization block that captures smooth facial variations, and a difference block to encode sudden facial variations. Extensive experiments using our proposed MDN result in improved performance while reducing the number of parameters by more than 3 when compared with 3D-ResNet models. Our model also outperforms other 3D models and achieves state-of-the-art results for depression detection. Code available at: https://github.com/wheidima/MDN

    Face anti-spoofing via sample learning based recurrent neural network (RNN)

    No full text
    Abstract Face biometric systems are vulnerable to spoofing attacks because of criminals who are developing different techniques such as print attack, replay attack, 3D mask attack, etc. to easily fool the face recognition systems. To improve the security measures of biometric systems, we propose a simple and effective architecture called sample learning based recurrent neural network (SLRNN). The proposed sample learning is based on sparse filtering which is applied for augmenting the features by leveraging Residual Networks (ResNet). The augmented features form as a sequence, which are fed into a Long Short-Term Memory (LSTM) network for constructing the final representation. We show that for face anti-spoofing task, incorporating sample learning into recurrent structures learn more meaningful representations to LSTM with much fewer model parameters. Experimental studies on MSU and CASIA dataset demonstrate that the proposed SLRNN has a superior performance than state-of-the-art methods used now
    corecore