17 research outputs found

    The Relationship Between Speech Features Changes When You Get Depressed: Feature Correlations for Improving Speed and Performance of Depression Detection

    Full text link
    This work shows that depression changes the correlation between features extracted from speech. Furthermore, it shows that using such an insight can improve the training speed and performance of depression detectors based on SVMs and LSTMs. The experiments were performed over the Androids Corpus, a publicly available dataset involving 112 speakers, including 58 people diagnosed with depression by professional psychiatrists. The results show that the models used in the experiments improve in terms of training speed and performance when fed with feature correlation matrices rather than with feature vectors. The relative reduction of the error rate ranges between 23.1% and 26.6% depending on the model. The probable explanation is that feature correlation matrices appear to be more variable in the case of depressed speakers. Correspondingly, such a phenomenon can be thought of as a depression marker

    MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection

    Full text link
    The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images, which has attracted extensive research attention due to its wide use in facial expression analysis. Many methods that perform well on automatic facial action unit (AU) detection primarily focus on modeling various types of AU relations between corresponding local muscle areas, or simply mining global attention-aware facial features, however, neglect the dynamic interactions among local-global features. We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features, as well as the detailed variability across AUs, because of the diversity in expression and individual characteristics. In this paper, we propose a novel Multi-level Graph Relational Reasoning Network (termed MGRR-Net) for facial AU detection. Each layer of MGRR-Net performs a multi-level (i.e., region-level, pixel-wise and channel-wise level) feature learning. While the region-level feature learning from local face patches features via graph neural network can encode the correlation across different AUs, the pixel-wise and channel-wise feature learning via graph attention network can enhance the discrimination ability of AU features from global face features. The fused features from the three levels lead to improved AU discriminative ability. Extensive experiments on DISFA and BP4D AU datasets show that the proposed approach achieves superior performance than the state-of-the-art methods.Comment: 10 pages, 4 figures, 8 tables; submitted to IEEE TMM for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Continuous Interaction With a Smart Speaker via Low-Dimensional Embeddings of Dynamic Hand Pose

    Get PDF
    This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. Frame-based hand pose features from MediaPipe Hands, containing 21 landmarks, are embedded into a 2 dimensional pose space by an autoencoder. The corresponding space for interaction with the music content is created by embedding high-dimensional music track profiles to a compatible two-dimensional embedding. A PointNet-based model is then applied to classify gestures which are used to control the device interaction or explore music spaces. By jointly optimising the autoencoder with the classifier, we manage to learn a more useful embedding space for discriminating gestures. We demonstrate the functionality of the system with experienced users selecting different musical moods by varying their hand pose

    Multi-Local Attention for Speech-Based Depression Detection

    Get PDF
    This article shows that an attention mechanism, the Multi-Local Attention, can improve a depression detection approach based on Long Short-Term Memory Networks. Besides leading to higher performance metrics (e.g., Accuracy and F1 Score), Multi-Local Attention improves two other aspects of the approach, both important from an application point of view. The first is the effectiveness of a confidence score associated to the detection outcome at identifying speakers more likely to be classified correctly. The second is the amount of speaking time needed to classify a speaker as depressed or non-depressed. The experiments were performed over read speech and involved 109 participants (including 55 diagnosed with depression by professional psychiatrists). The results show accuracies up to 88.0% (F1 Score 88.0%)

    ALGRNet: Multi-relational adaptive facial action unit modelling for face representation and relevant recognitions

    Get PDF
    Facial action units (AUs) represent the fundamental activities of a group of muscles, exhibiting subtle changes that are useful for various face analysis tasks. One practical application in real-life situations is the automatic estimation of facial paralysis. This involves analyzing the delicate changes in facial muscle regions and skin textures. It seems logical to assess the severity of facial paralysis by combining well-defined muscle regions (similar to AUs) symmetrically, thus creating a comprehensive facial representation. To this end, we have developed a new model to estimate the severity of facial paralysis automatically and is inspired by the facial action units (FAU) recognition that deals with rich, detailed facial appearance information, such as texture, muscle status, etc. Specifically, a novel Adaptive Local-Global Relational Network (ALGRNet) is designed to adaptively mine the context of well-defined facial muscles and enhance the visual details of facial appearance and texture, which can be flexibly adapted to facial-based tasks, e.g., FAU recognition and facial paralysis estimation. ALGRNet consists of three key structures: (i) an adaptive region learning module that identifies high-potential muscle response regions, (ii) a skip-BiLSTM that models the latent relationships among local regions, enabling better correlation between multiple regional lesion muscles and texture changes, and (iii) a feature fusion&refining module that explores the complementarity between the local and global aspects of the face. We have extensively evaluated ALGRNet to demonstrate its effectiveness using two widely recognized AU benchmarks, BP4D and DISFA. Furthermore, to assess the efficacy of FAUs in subsequent applications, we have investigated their application in the identification of facial paralysis. Experimental findings obtained from a facial paralysis benchmark, meticulously gathered and annotated by medical experts, underscore the potential of utilizing identified AU attributes to estimate the severity of facial paralysis

    Local Global Relational Network for Facial Action Units Recognition

    Get PDF
    Many existing facial action units (AUs) recognition approaches often enhance the AU representation by combining local features from multiple independent branches, each corresponding to a different AU. However, such multi-branch combination-based methods usually neglect potential mutual assistance and exclusion relationship between AU branches or simply employ a pre-defined and fixed knowledge-graph as a prior. In addition, extracting features from pre-defined AU regions of regular shapes limits the representation ability. In this paper, we propose a novel Local Global Relational Network (LGRNet) for facial AU recognition. LGRNet mainly consists of two novel structures, i.e., a skip-BiLSTM module which models the latent mutual assistance and exclusion relationship among local AU features from multiple branches to enhance the feature robustness, and a feature fusion&refining module which explores the complementarity between local AUs and the whole face in order to refine the local AU features to improve the discriminability. Experiments on the BP4D and DISFA AU datasets show that the proposed approach outperforms the state-of-the-art methods by a large margin
    corecore