199 research outputs found
Emotion and Stress Recognition Related Sensors and Machine Learning Technologies
This book includes impactful chapters which present scientific concepts, frameworks, architectures and ideas on sensing technologies and machine learning techniques. These are relevant in tackling the following challenges: (i) the field readiness and use of intrusive sensor systems and devices for capturing biosignals, including EEG sensor systems, ECG sensor systems and electrodermal activity sensor systems; (ii) the quality assessment and management of sensor data; (iii) data preprocessing, noise filtering and calibration concepts for biosignals; (iv) the field readiness and use of nonintrusive sensor technologies, including visual sensors, acoustic sensors, vibration sensors and piezoelectric sensors; (v) emotion recognition using mobile phones and smartwatches; (vi) body area sensor networks for emotion and stress studies; (vii) the use of experimental datasets in emotion recognition, including dataset generation principles and concepts, quality insurance and emotion elicitation material and concepts; (viii) machine learning techniques for robust emotion recognition, including graphical models, neural network methods, deep learning methods, statistical learning and multivariate empirical mode decomposition; (ix) subject-independent emotion and stress recognition concepts and systems, including facial expression-based systems, speech-based systems, EEG-based systems, ECG-based systems, electrodermal activity-based systems, multimodal recognition systems and sensor fusion concepts and (x) emotion and stress estimation and forecasting from a nonlinear dynamical system perspective
InMyFace: Inertial and Mechanomyography-Based Sensor Fusion for Wearable Facial Activity Recognition
Recognizing facial activity is a well-understood (but non-trivial) computer
vision problem. However, reliable solutions require a camera with a good view
of the face, which is often unavailable in wearable settings. Furthermore, in
wearable applications, where systems accompany users throughout their daily
activities, a permanently running camera can be problematic for privacy (and
legal) reasons. This work presents an alternative solution based on the fusion
of wearable inertial sensors, planar pressure sensors, and acoustic
mechanomyography (muscle sounds). The sensors were placed unobtrusively in a
sports cap to monitor facial muscle activities related to facial expressions.
We present our integrated wearable sensor system, describe data fusion and
analysis methods, and evaluate the system in an experiment with thirteen
subjects from different cultural backgrounds (eight countries) and both sexes
(six women and seven men). In a one-model-per-user scheme and using a late
fusion approach, the system yielded an average F1 score of 85.00% for the case
where all sensing modalities are combined. With a cross-user validation and a
one-model-for-all-user scheme, an F1 score of 79.00% was obtained for thirteen
participants (six females and seven males). Moreover, in a hybrid fusion
(cross-user) approach and six classes, an average F1 score of 82.00% was
obtained for eight users. The results are competitive with state-of-the-art
non-camera-based solutions for a cross-user study. In addition, our unique set
of participants demonstrates the inclusiveness and generalizability of the
approach.Comment: Submitted to Information Fusion, Elsevie
Estimation of the QoE for video streaming services based on facial expressions and gaze direction
As the multimedia technologies evolve, the need to control their quality becomes even more important making the Quality of Experience (QoE) measurements a key priority. Machine Learning (ML) can support this task providing models to analyse the information extracted by the multimedia. It is possible to divide the ML models applications in the following categories:
1) QoE modelling: ML is used to define QoE models which provide an output (e.g., perceived QoE score) for any given input (e.g., QoE influence factor).
2) QoE monitoring in case of encrypted traffic: ML is used to analyze passive traffic monitored data to obtain insight into degradations perceived by end-users.
3) Big data analytics: ML is used for the extraction of meaningful and useful information from the collected data, which can further be converted to actionable knowledge and utilized in managing QoE.
The QoE estimation quality task can be carried out by using two approaches: the objective approach and subjective one. As the two names highlight, they are referred to the pieces of information that the model analyses. The objective approach analyses the objective features extracted by the network connection and by the used media. As objective parameters, the state-of-the-art shows different approaches that use also the features extracted by human behaviour. The subjective approach instead, comes as a result of the rating approach, where the participants were asked to rate the perceived quality using different scales. This approach had the problem of being a time-consuming approach and for this reason not all the users agree to compile the questionnaire. Thus the direct evolution of this approach is the ML model adoption. A model can substitute the questionnaire and evaluate the QoE, depending on the data that analyses. By modelling the human response to the perceived quality on multimedia, QoE researchers found that the parameters extracted from the users could be different, like Electroencephalogram (EEG), Electrocardiogram (ECG), waves of the brain. The main problem with these techniques is the hardware. In fact, the user must wear electrodes in case of ECG and EEG, and also if the obtained results from these methods are relevant, their usage in a real context could be not feasible. For this reason, my studies have been focused on the developing of a Machine Learning framework completely unobtrusively based on the Facial reactions
Facial expression recognition and intensity estimation.
Doctoral Degree. University of KwaZulu-Natal, Durban.Facial Expression is one of the profound non-verbal channels through which human emotion state is inferred from the deformation or movement of face components when facial muscles are activated. Facial Expression Recognition (FER) is one of the relevant research fields in Computer Vision (CV) and Human-Computer Interraction (HCI). Its application is not limited to: robotics, game, medical, education, security and marketing. FER consists of a wealth of information. Categorising the information into primary emotion states only limit its performance. This thesis considers investigating an approach that simultaneously predicts the emotional state of facial expression images and the corresponding degree of intensity. The task also extends to resolving FER ambiguous nature and annotation inconsistencies with a label distribution learning method that considers correlation among data. We first proposed a multi-label approach for FER and its intensity estimation using advanced machine learning techniques. According to our findings, this approach has not been considered for emotion and intensity estimation in the field before. The approach used problem transformation to present FER as a multilabel task, such that every facial expression image has unique emotion information alongside the corresponding degree of intensity at which the emotion is displayed. A Convolutional Neural Network (CNN) with a sigmoid function at the final layer is the classifier for the model. The model termed ML-CNN (Multilabel Convolutional Neural Network) successfully achieve concurrent prediction of emotion and intensity estimation. ML-CNN prediction is challenged with overfitting and intraclass and interclass variations. We employ Visual Geometric Graphics-16 (VGG-16) pretrained network to resolve the overfitting challenge and the aggregation of island loss and binary cross-entropy loss to minimise the effect of intraclass and interclass variations. The enhanced ML-CNN model shows promising results and outstanding performance than other standard multilabel algorithms. Finally, we approach data annotation inconsistency and ambiguity in FER data using isomap manifold learning with Graph Convolutional Networks (GCN). The GCN uses the distance along the isomap manifold as the edge weight, which appropriately models the similarity between adjacent nodes for emotion predictions. The proposed method produces a promising result in comparison with the state-of-the-art methods.Author's List of Publication is on page xi of this thesis
A novel NMF-based DWI CAD framework for prostate cancer.
In this thesis, a computer aided diagnostic (CAD) framework for detecting prostate cancer in DWI data is proposed. The proposed CAD method consists of two frameworks that use nonnegative matrix factorization (NMF) to learn meaningful features from sets of high-dimensional data. The first technique, is a three dimensional (3D) level-set DWI prostate segmentation algorithm guided by a novel probabilistic speed function. This speed function is driven by the features learned by NMF from 3D appearance, shape, and spatial data. The second technique, is a probabilistic classifier that seeks to label a prostate segmented from DWI data as either alignat, contain cancer, or benign, containing no cancer. This approach uses a NMF-based feature fusion to create a feature space where data classes are clustered. In addition, using DWI data acquired at a wide range of b-values (i.e. magnetic field strengths) is investigated. Experimental analysis indicates that for both of these frameworks, using NMF producing more accurate segmentation and classification results, respectively, and that combining the information from DWI data at several b-values can assist in detecting prostate cancer
Role of machine learning in early diagnosis of kidney diseases.
Machine learning (ML) and deep learning (DL) approaches have been used as indispensable tools in modern artificial intelligence-based computer-aided diagnostic (AIbased CAD) systems that can provide non-invasive, early, and accurate diagnosis of a given medical condition. These AI-based CAD systems have proven themselves to be reproducible and have the generalization ability to diagnose new unseen cases with several diseases and medical conditions in different organs (e.g., kidneys, prostate, brain, liver, lung, breast, and bladder). In this dissertation, we will focus on the role of such AI-based CAD systems in early diagnosis of two kidney diseases, namely: acute rejection (AR) post kidney transplantation and renal cancer (RC). A new renal computer-assisted diagnostic (Renal-CAD) system was developed to precisely diagnose AR post kidney transplantation at an early stage. The developed Renal-CAD system perform the following main steps: (1) auto-segmentation of the renal allograft from surrounding tissues from diffusion weighted magnetic resonance imaging (DW-MRI) and blood oxygen level-dependent MRI (BOLD-MRI), (2) extraction of image markers, namely: voxel-wise apparent diffusion coefficients (ADCs) are calculated from DW-MRI scans at 11 different low and high b-values and then represented as cumulative distribution functions (CDFs) and extraction of the transverse relaxation rate (R2*) values from the segmented kidneys using BOLD-MRI scans at different echotimes, (3) integration of multimodal image markers with the associated clinical biomarkers, serum creatinine (SCr) and creatinine clearance (CrCl), and (4) diagnosing renal allograft status as nonrejection (NR) or AR by utilizing these integrated biomarkers and the developed deep learning classification model built on stacked auto-encoders (SAEs). Using a leaveone- subject-out cross-validation approach along with SAEs on a total of 30 patients with transplanted kidney (AR = 10 and NR = 20), the Renal-CAD system demonstrated 93.3% accuracy, 90.0% sensitivity, and 95.0% specificity in differentiating AR from NR. Robustness of the Renal-CAD system was also confirmed by the area under the curve value of 0.92. Using a stratified 10-fold cross-validation approach, the Renal-CAD system demonstrated its reproduciblity and robustness with a diagnostic accuracy of 86.7%, sensitivity of 80.0%, specificity of 90.0%, and AUC of 0.88. In addition, a new renal cancer CAD (RC-CAD) system for precise diagnosis of RC at an early stage was developed, which incorporates the following main steps: (1) estimating the morphological features by applying a new parametric spherical harmonic technique, (2) extracting appearance-based features, namely: first order textural features are calculated and second order textural features are extracted after constructing the graylevel co-occurrence matrix (GLCM), (3) estimating the functional features by constructing wash-in/wash-out slopes to quantify the enhancement variations across different contrast enhanced computed tomography (CE-CT) phases, (4) integrating all the aforementioned features and modeling a two-stage multilayer perceptron artificial neural network (MLPANN) classifier to classify the renal tumor as benign or malignant and identify the malignancy subtype. On a total of 140 RC patients (malignant = 70 patients (ccRCC = 40 and nccRCC = 30) and benign angiomyolipoma tumors = 70), the developed RC-CAD system was validated using a leave-one-subject-out cross-validation approach. The developed RC-CAD system achieved a sensitivity of 95.3% ± 2.0%, a specificity of 99.9% ± 0.4%, and Dice similarity coefficient of 0.98 ± 0.01 in differentiating malignant from benign renal tumors, as well as an overall accuracy of 89.6% ± 5.0% in the sub-typing of RCC. The diagnostic abilities of the developed RC-CAD system were further validated using a randomly stratified 10-fold cross-validation approach. The results obtained using the proposed MLP-ANN classification model outperformed other machine learning classifiers (e.g., support vector machine, random forests, and relational functional gradient boosting) as well as other different approaches from the literature. In summary, machine and deep learning approaches have shown potential abilities to be utilized to build AI-based CAD systems. This is evidenced by the promising diagnostic performance obtained by both Renal-CAD and RC-CAD systems. For the Renal- CAD, the integration of functional markers extracted from multimodal MRIs with clinical biomarkers using SAEs classification model, potentially improved the final diagnostic results evidenced by high accuracy, sensitivity, and specificity. The developed Renal-CAD demonstrated high feasibility and efficacy for early, accurate, and non-invasive identification of AR. For the RC-CAD, integrating morphological, textural, and functional features extracted from CE-CT images using a MLP-ANN classification model eventually enhanced the final results in terms of accuracy, sensitivity, and specificity, making the proposed RC-CAD a reliable noninvasive diagnostic tool for RC. The early and accurate diagnosis of AR or RC will help physicians to provide early intervention with the appropriate treatment plan to prolong the life span of the diseased kidney, increase the survival chance of the patient, and thus improve the healthcare outcome in the U.S. and worldwide
Handling Class Imbalance Using Swarm Intelligence Techniques, Hybrid Data and Algorithmic Level Solutions
This research focuses mainly on the binary class imbalance problem in data mining. It investigates the use of combined approaches of data and algorithmic level solutions. Moreover, it examines the use of swarm intelligence and population-based techniques to combat the class imbalance problem at all levels, including at the data, algorithmic, and feature level. It also introduces various solutions to the class imbalance problem, in which swarm intelligence techniques like Stochastic Diffusion Search (SDS) and Dispersive Flies Optimisation (DFO) are used. The algorithms were evaluated using experiments on imbalanced datasets, in which the Support Vector Machine (SVM) was used as a classifier. SDS was used to perform informed undersampling of the majority class to balance the dataset. The results indicate that this algorithm improves the classifier performance and can be used on imbalanced datasets. Moreover, SDS was extended further to perform feature selection on high dimensional datasets. Experimental results show that SDS can be used to perform feature selection and improve the classifier performance on imbalanced datasets. Further experiments evaluated DFO as an algorithmic level solution to optimise the SVM kernel parameters when learning from imbalanced datasets. Based on the promising results of DFO in these experiments, the novel approach was extended further to provide a hybrid algorithm that simultaneously optimises the kernel parameters and performs feature selection
CAD system for early diagnosis of diabetic retinopathy based on 3D extracted imaging markers.
This dissertation makes significant contributions to the field of ophthalmology, addressing the segmentation of retinal layers and the diagnosis of diabetic retinopathy (DR). The first contribution is a novel 3D segmentation approach that leverages the patientspecific anatomy of retinal layers. This approach demonstrates superior accuracy in segmenting all retinal layers from a 3D retinal image compared to current state-of-the-art methods. It also offers enhanced speed, enabling potential clinical applications. The proposed segmentation approach holds great potential for supporting surgical planning and guidance in retinal procedures such as retinal detachment repair or macular hole closure. Surgeons can benefit from the accurate delineation of retinal layers, enabling better understanding of the anatomical structure and more effective surgical interventions. Moreover, real-time guidance systems can be developed to assist surgeons during procedures, improving overall patient outcomes. The second contribution of this dissertation is the introduction of a novel computeraided diagnosis (CAD) system for precise identification of diabetic retinopathy. The CAD system utilizes 3D-OCT imaging and employs an innovative approach that extracts two distinct features: first-order reflectivity and 3D thickness. These features are then fused and used to train and test a neural network classifier. The proposed CAD system exhibits promising results, surpassing other machine learning and deep learning algorithms commonly employed in DR detection. This demonstrates the effectiveness of the comprehensive analysis approach employed by the CAD system, which considers both low-level and high-level data from the 3D retinal layers. The CAD system presents a groundbreaking contribution to the field, as it goes beyond conventional methods, optimizing backpropagated neural networks to integrate multiple levels of information effectively. By achieving superior performance, the proposed CAD system showcases its potential in accurately diagnosing DR and aiding in the prevention of vision loss. In conclusion, this dissertation presents novel approaches for the segmentation of retinal layers and the diagnosis of diabetic retinopathy. The proposed methods exhibit significant improvements in accuracy, speed, and performance compared to existing techniques, opening new avenues for clinical applications and advancements in the field of ophthalmology. By addressing future research directions, such as testing on larger datasets, exploring alternative algorithms, and incorporating user feedback, the proposed methods can be further refined and developed into robust, accurate, and clinically valuable tools for diagnosing and monitoring retinal diseases
Fear Classification using Affective Computing with Physiological Information and Smart-Wearables
Mención Internacional en el título de doctorAmong the 17 Sustainable Development Goals proposed within the 2030 Agenda
and adopted by all of the United Nations member states, the fifth SDG is a call
for action to effectively turn gender equality into a fundamental human right and
an essential foundation for a better world. It includes the eradication of all types
of violence against women. Focusing on the technological perspective, the range of
available solutions intended to prevent this social problem is very limited. Moreover,
most of the solutions are based on a panic button approach, leaving aside
the usage and integration of current state-of-the-art technologies, such as the Internet
of Things (IoT), affective computing, cyber-physical systems, and smart-sensors.
Thus, the main purpose of this research is to provide new insight into the design and
development of tools to prevent and combat Gender-based Violence risky situations
and, even, aggressions, from a technological perspective, but without leaving aside
the different sociological considerations directly related to the problem. To achieve
such an objective, we rely on the application of affective computing from a realist
point of view, i.e. targeting the generation of systems and tools capable of being implemented
and used nowadays or within an achievable time-frame. This pragmatic
vision is channelled through: 1) an exhaustive study of the existing technological
tools and mechanisms oriented to the fight Gender-based Violence, 2) the proposal
of a new smart-wearable system intended to deal with some of the current technological
encountered limitations, 3) a novel fear-related emotion classification approach
to disentangle the relation between emotions and physiology, and 4) the definition
and release of a new multi-modal dataset for emotion recognition in women.
Firstly, different fear classification systems using a reduced set of physiological signals are explored and designed. This is done by employing open datasets together
with the combination of time, frequency and non-linear domain techniques. This
design process is encompassed by trade-offs between both physiological considerations
and embedded capabilities. The latter is of paramount importance due to
the edge-computing focus of this research. Two results are highlighted in this first
task, the designed fear classification system that employed the DEAP dataset data
and achieved an AUC of 81.60% and a Gmean of 81.55% on average for a subjectindependent
approach, and only two physiological signals; and the designed fear
classification system that employed the MAHNOB dataset data achieving an AUC
of 86.00% and a Gmean of 73.78% on average for a subject-independent approach,
only three physiological signals, and a Leave-One-Subject-Out configuration. A detailed
comparison with other emotion recognition systems proposed in the literature
is presented, which proves that the obtained metrics are in line with the state-ofthe-
art.
Secondly, Bindi is presented. This is an end-to-end autonomous multimodal system
leveraging affective IoT throughout auditory and physiological commercial off-theshelf
smart-sensors, hierarchical multisensorial fusion, and secured server architecture
to combat Gender-based Violence by automatically detecting risky situations
based on a multimodal intelligence engine and then triggering a protection protocol.
Specifically, this research is focused onto the hardware and software design of one of
the two edge-computing devices within Bindi. This is a bracelet integrating three
physiological sensors, actuators, power monitoring integrated chips, and a System-
On-Chip with wireless capabilities. Within this context, different embedded design
space explorations are presented: embedded filtering evaluation, online physiological
signal quality assessment, feature extraction, and power consumption analysis.
The reported results in all these processes are successfully validated and, for some
of them, even compared against physiological standard measurement equipment.
Amongst the different obtained results regarding the embedded design and implementation
within the bracelet of Bindi, it should be highlighted that its low power
consumption provides a battery life to be approximately 40 hours when using a 500
mAh battery.
Finally, the particularities of our use case and the scarcity of open multimodal datasets dealing with emotional immersive technology, labelling methodology considering
the gender perspective, balanced stimuli distribution regarding the target
emotions, and recovery processes based on the physiological signals of the volunteers
to quantify and isolate the emotional activation between stimuli, led us to the definition
and elaboration of Women and Emotion Multi-modal Affective Computing
(WEMAC) dataset. This is a multimodal dataset in which 104 women who never
experienced Gender-based Violence that performed different emotion-related stimuli
visualisations in a laboratory environment. The previous fear binary classification
systems were improved and applied to this novel multimodal dataset. For instance,
the proposed multimodal fear recognition system using this dataset reports up to
60.20% and 67.59% for ACC and F1-score, respectively. These values represent a
competitive result in comparison with the state-of-the-art that deal with similar
multi-modal use cases.
In general, this PhD thesis has opened a new research line within the research group
under which it has been developed. Moreover, this work has established a solid base
from which to expand knowledge and continue research targeting the generation of
both mechanisms to help vulnerable groups and socially oriented technology.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: David Atienza Alonso.- Secretaria: Susana Patón Álvarez.- Vocal: Eduardo de la Torre Arnan
- …