6 research outputs found
Improving Depression estimation from facial videos with face alignment, training optimization and scheduling
Deep learning models have shown promising results in recognizing depressive
states using video-based facial expressions. While successful models typically
leverage using 3D-CNNs or video distillation techniques, the different use of
pretraining, data augmentation, preprocessing, and optimization techniques
across experiments makes it difficult to make fair architectural comparisons.
We propose instead to enhance two simple models based on ResNet-50 that use
only static spatial information by using two specific face alignment methods
and improved data augmentation, optimization, and scheduling techniques. Our
extensive experiments on benchmark datasets obtain similar results to
sophisticated spatio-temporal models for single streams, while the score-level
fusion of two different streams outperforms state-of-the-art methods. Our
findings suggest that specific modifications in the preprocessing and training
process result in noticeable differences in the performance of the models and
could hide the actual originally attributed to the use of different neural
network architectures.Comment: 5 page
Audio-Based Classification of Respiratory Diseases using Advanced Signal Processing and Machine Learning for Assistive Diagnosis Support
In global healthcare, respiratory diseases are a leading cause of mortality,
underscoring the need for rapid and accurate diagnostics. To advance rapid
screening techniques via auscultation, our research focuses on employing one of
the largest publicly available medical database of respiratory sounds to train
multiple machine learning models able to classify different health conditions.
Our method combines Empirical Mode Decomposition (EMD) and spectral analysis to
extract physiologically relevant biosignals from acoustic data, closely tied to
cardiovascular and respiratory patterns, making our approach apart in its
departure from conventional audio feature extraction practices. We use Power
Spectral Density analysis and filtering techniques to select Intrinsic Mode
Functions (IMFs) strongly correlated with underlying physiological phenomena.
These biosignals undergo a comprehensive feature extraction process for
predictive modeling. Initially, we deploy a binary classification model that
demonstrates a balanced accuracy of 87% in distinguishing between healthy and
diseased individuals. Subsequently, we employ a six-class classification model
that achieves a balanced accuracy of 72% in diagnosing specific respiratory
conditions like pneumonia and chronic obstructive pulmonary disease (COPD). For
the first time, we also introduce regression models that estimate age and body
mass index (BMI) based solely on acoustic data, as well as a model for gender
classification. Our findings underscore the potential of this approach to
significantly enhance assistive and remote diagnostic capabilities.Comment: 5 pages, 2 figures, 3 tables, Conference pape
Depression recognition from facial videos: Preprocessing and scheduling choices hide the architectural contributions
Abstract Deep learning models have been widely applied in video‐based depression detection. It is observed that the diversity of preprocessing, data augmentation, and optimization techniques makes it difficult to fairly compare model architectures. In this study, the typical ResNet‐50 model is enhanced by using specific face alignment methods, improved data augmentation, optimization, and scheduling techniques. The extensive experiments on two popular benchmark datasets (AVEC2013 and AVEC2014) obtained competitive results, compared to sophisticated spatio‐temporal models for single streams. Moreover, the score‐level fusion approach based on two texture streams outperformed the state‐of‐the‐art methods. It achieved mean square errors of 5.82 and 5.50 on AVEC2013 and AVEC2014, respectively. These findings suggest that the preprocessing and training configurations result in noticeable improvements, which have been originally attributed to the network architectures
Depression recognition from facial videos:Preprocessing and scheduling choices hide the architectural contributions
Deep learning models have been widely applied in video-based depression detection. It is observed that the diversity of preprocessing, data augmentation, and optimization techniques makes it difficult to fairly compare model architectures. In this study, the typical ResNet-50 model is enhanced by using specific face alignment methods, improved data augmentation, optimization, and scheduling techniques. The extensive experiments on two popular benchmark datasets (AVEC2013 and AVEC2014) obtained competitive results, compared to sophisticated spatio-temporal models for single streams. Moreover, the score-level fusion approach based on two texture streams outperformed the state-of-the-art methods. It achieved mean square errors of 5.82 and 5.50 on AVEC2013 and AVEC2014, respectively. These findings suggest that the preprocessing and training configurations result in noticeable improvements, which have been originally attributed to the network architectures.</p
Depression recognition from facial videos:Preprocessing and scheduling choices hide the architectural contributions
Deep learning models have been widely applied in video-based depression detection. It is observed that the diversity of preprocessing, data augmentation, and optimization techniques makes it difficult to fairly compare model architectures. In this study, the typical ResNet-50 model is enhanced by using specific face alignment methods, improved data augmentation, optimization, and scheduling techniques. The extensive experiments on two popular benchmark datasets (AVEC2013 and AVEC2014) obtained competitive results, compared to sophisticated spatio-temporal models for single streams. Moreover, the score-level fusion approach based on two texture streams outperformed the state-of-the-art methods. It achieved mean square errors of 5.82 and 5.50 on AVEC2013 and AVEC2014, respectively. These findings suggest that the preprocessing and training configurations result in noticeable improvements, which have been originally attributed to the network architectures.</p
Depression recognition using remote photoplethysmography from facial videos
Abstract
Depression is a mental illness that may be harmful to an individual’s health. The detection of mental health disorders in the early stages and a precise diagnosis are critical to avoid social, physiological, or psychological side effects. This work analyzes physiological signals to observe if different depressive states have a noticeable impact on the blood volume pulse (BVP) and the heart rate variability (HRV) response. Although typically, HRV features are calculated from biosignals obtained with contact-based sensors such as wearables, we propose instead a novel scheme that directly extracts them from facial videos, just based on visual information, removing the need for any contact-based device. Our solution is based on a pipeline that is able to extract complete remote photoplethysmography signals (rPPG) in a fully unsupervised manner. We use these rPPG signals to calculate over 60 statistical, geometrical, and physiological features that are further used to train several machine learning regressors to recognize different levels of depression. Experiments on two benchmark datasets indicate that this approach offers comparable results to other audiovisual modalities based on voice or facial expression, potentially complementing them. In addition, the results achieved for the proposed method show promising and solid performance that outperforms hand-engineered methods and is comparable to deep learning-based approaches