Search CORE

112 research outputs found

Artificial Intelligence Tools for Facial Expression Analysis.

Author: Al Harbawee L
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 07/09/2020
Field of study

Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier

Open Research Exeter

Emotion Recognition with Deep Neural Networks

Author: Ebrahimi Kahou Samira
Publication venue
Publication date: 01/07/2016
Field of study

RÉSUMÉ La reconnaissance automatique des émotions humaines a été étudiée pendant des décennies. Il est l'un des éléments clés de l'interaction homme-ordinateur dans les domaines des soins de santé, de l'éducation, du divertissement et de la publicité. La reconnaissance des émotions est une tâche difficile car elle repose sur la prédiction des états émotionnels abstraits à partir de données d'entrée multimodales. Ces modalités comprennent la vidéo, l’audio et des signaux physiologiques. La modalité visuelle est l'un des canaux les plus informatifs. Notons en particulier les expressions du visage qui sont un très fort indicateur de l'état émotionnel d'un sujet. Un système automatisé commun de reconnaissance d'émotion comprend plusieurs étapes de traitement, dont chacune doit être réglée et intégrée dans un pipeline. Ces pipelines sont souvent ajustés à la main, et ce processus peut introduire des hypothèses fortes sur les propriétés de la tâche et des données. Limiter ces hypothèses et utiliser un apprentissage automatique du pipeline de traitement de données donne souvent des solutions plus générales. Au cours des dernières années, il a été démontré que les méthodes d'apprentissage profond mènent à de bonnes représentations pour diverses modalités. Pour de nombreux benchmarks, l'écart diminue rapidement entre les algorithmes de pointe basés sur des réseaux neuronaux profonds et la performance humaine. Ces réseaux apprennent hiérarchies de caractéristiques. Avec la profondeur croissante, ces hiérarchies peuvent décrire des concepts plus abstraits. Cette progrès suggèrent d'explorer les applications de ces méthodes d'apprentissage à l'analyse du visage et de la reconnaissance des émotions. Cette thèse repose sur une étude préliminaire et trois articles, qui contribuent au domaine de la reconnaissance des émotions. L'étude préliminaire présente une nouvelle variante de Patterns Binaires Locales (PBL), qui est utilisé comme une représentation binaire de haute dimension des images faciales. Il est commun de créer des histogrammes de caractéristiques de PBL dans les régions d'images d'entrée. Toutefois, dans ce travail, ils sont utilisés en tant que vecteurs binaires de haute dimension qui sont extraits à des échelles multiples autour les points clés faciales détectées. Nous examinons un pipeline constitué de la réduction de la dimensionnalité non supervisé et supervisé, en utilisant l'Analyse en Composantes Principales (ACP) et l'Analyse Discriminante Fisher Locale (ADFL), suivi d'une Machine à Vecteurs de Support (MVS) comme classificateur pour la prédiction des expressions faciales. Les expériences montrent que les étapes de réduction de dimensionnalité fournissent de la robustesse en présence de bruit dans points clés. Cette approche atteint, lors de sa publication, des performances de l’état de l’art dans la reconnaissance de l'expression du visage sur l’ensemble de données Extended Cohn-Kanade (CK+) (Lucey et al, 2010) et sur la détection de sourire sur l’ensemble de données GENKI (GENKI-4K, 2008). Pour la tâche de détection de sourire, un profond Réseau Neuronal Convolutif (RNC) a été utilisé pour référence fiable. La reconnaissance de l'émotion dans les vidéos semblable à ceux de la vie de tous les jours, tels que les clips de films d'Hollywood dans l'Emotion Recognition in the Wild (EmotiW) challenge (Dhall et al, 2013), est beaucoup plus difficile que dans des environnements de laboratoire contrôlées. Le premier article est une analyse en profondeur de la entrée gagnante de l'EmotiW 2013 challenge (Kahou et al, 2013) avec des expériments supplémentaires sur l'ensemble de données du défi de l’an 2014. Le pipeline est constitué d'une combinaison de modèles d'apprentissage en profondeur, chacun spécialisé dans une modalité. Ces modèles comprennent une nouvelle technique d’agrégation de caractéristiques d’images individuelles pour permettre de transférer les caractéristiques apprises par réseaux convolutionnels (CNN) sur un grand ensemble de données d’expressions faciales, et de les application au domaine de l’analyse de contenu vidéo. On y trouve aussi un ``deep belief net'' (DBN) pour les caractéristiques audio, un pipeline de reconnaissance d’activité pour capturer les caractéristiques spatio-temporelles, ainsi qu’modèle de type ``bag-of-mouths'' basé sur k-means pour extraire les caractéristiques propres à la bouche. Plusieurs approches pour la fusion des prédictions des modèles spécifiques à la modalité sont comparés. La performance après un nouvel entraînement basé sur les données de 2014, établis avec quelques adaptations, est toujours comparable à l’état de l’art actuel. Un inconvénient de la méthode décrite dans le premier article est l'approche de l'agrégation de la modalité visuelle qui implique la mise en commun par image requiert un vecteur de longueur fixe. Cela ne tient pas compte de l'ordre temporel à l'intérieur des segments groupés. Les Réseau de Neurones Récurrents (RNR) sont des réseaux neuronaux construits pour le traitement séquentiel des données. Ils peuvent résoudre ce problème en résumant les images dans un vecteur de valeurs réelles qui est mis à jour à chaque pas de temps. En général, les RNR fournissent une façon d'apprendre une approche d'agrégation d'une manière axée sur les données. Le deuxième article analyse l'application d'un RNR sur les caractéristiques issues d’un réseau neuronal de convolution utilisé pour la reconnaissance des émotions dans la vidéo. Une comparaison de la RNR avec l'approche fondée sur pooling montre une amélioration significative des performances de classification. Il comprend également une fusion au niveau de la caractéristiques et au niveau de décision de modèles pour différentes modalités. En plus d’utiliser RNR comme dans les travaux antérieurs, il utilise aussi un modèle audio basé sur MVS, ainsi que l'ancien modèle d'agrégation qui sont fusionnées pour améliorer les performances sur l'ensemble de données de défi EmotiW 2015. Cette approche a terminé en troisième position dans le concours, avec une différence de seulement 1% dans la précision de classification par rapport au modèle gagnant. Le dernier article se concentre sur un problème de vision par ordinateur plus général, à savoir le suivi visuel. Un RNR est augmenté avec un mécanisme d'attention neuronal qui lui permet de se concentrer sur l'information liée à une tâche, ignorant les distractions potentielles dans la trame vidéo d'entrée. L'approche est formulée dans un cadre neuronal modulaire constitué de trois composantes: un module d'attention récurrente qui détermine où chercher, un module d'extraction de caractéristiques fournissant une représentation de quel objet est vu, et un module objectif qui indique pourquoi un comportement attentionnel est appris. Chaque module est entièrement différentiables, ce qui permet une optimisation simple à base de gradient. Un tel cadre pourrait être utilisé pour concevoir une solution de bout en bout pour la reconnaissance de l'émotion dans la vision, ne nécessitant pas les étapes initiales de détection de visage ou de localisation d’endroits d’intérêt. L'approche est présentée dans trois ensembles de données de suivi, y compris un ensemble de données du monde réel. En résumé, cette thèse explore et développe une multitude de techniques d'apprentissage en profondeur, complétant des étapes importantes en vue de l’objectif à long terme de la construction d'un système entraînable de bout en bout pour la reconnaissance des émotions.----------ABSTRACT Automatic recognition of human emotion has been studied for decades. It is one of the key components in human computer interaction with applications in health care, education, entertainment and advertisement. Emotion recognition is a challenging task as it involves predicting abstract emotional states from multi-modal input data. These modalities include video, audio and physiological signals. The visual modality is one of the most informative channels; especially facial expressions, which have been shown to be strong cues for the emotional state of a subject. A common automated emotion recognition system includes several processing steps, each of which has to be tuned and integrated into a pipeline. Such pipelines are often hand-engineered which can introduce strong assumptions about the properties of the task and data. Limiting assumptions and learning the processing pipeline from data often yields more general solutions. In recent years, deep learning methods have been shown to be able to learn good representations for various modalities. For many computer vision benchmarks, the gap between state-of-the-art algorithms based on deep neural networks and human performance is shrinking rapidly. These networks learn hierarchies of features. With increasing depth, these hierarchies can describe increasingly abstract concepts. This development suggests exploring the applications of such learning methods to facial analysis and emotion recognition. This thesis is based on a preliminary study and three articles, which contribute to the field of emotion recognition. The preliminary study introduces a new variant of Local Binary Patterns (LBPs), which is used as a high dimensional binary representation of facial images. It is common to create histograms of LBP features within regions of input images. However, in this work, they are used as high dimensional binary vectors that are extracted at multiple scales around detected facial keypoints. We examine a pipeline consisting of unsupervised and supervised dimensionality reduction, using Principal Component Analysis (PCA) and Local Fisher Discriminant Analysis (LFDA), followed by a Support Vector Machine (SVM) classifier for prediction of facial expressions. The experiments show that the dimensionality reduction steps provide robustness in the presence of noisy keypoints. This approach achieved state-of-the-art performance in facial expression recognition on the Extended Cohn-Kanade (CK+) data set (Lucey et al, 2010) and smile detection on the GENKI data set (GENKI-4K, 2008) at the time. For the smile detection task, a deep Convolutional Neural Network (CNN) was used as a strong baseline. Emotion recognition in close-to-real-world videos, such as the Hollywood film clips in the Emotion Recognition in the Wild (EmotiW) challenge (Dhall et al, 2013), is much harder than in controlled lab environments. The first article is an in-depth analysis of the EmotiW 2013 challenge winning entry (Kahou et al, 2013) with additional experiments on the data set of the 2014 challenge. The pipeline consists of a combination of deep learning models, each specializing on one modality. The models include the following: a novel aggregation of per-frame features helps to transfer powerful CNN features learned on a large pooled data set of facial expression images to the video domain, a Deep Belief Network (DBN) learns audio features, an activity recognition pipeline captures spatio-temporal motion features and a k-means based bag-of-mouths model extracts features around the mouth region. Several approaches for fusing the predictions of modality-specific models are compared. The performance after re-training on the 2014 data set with a few adaptions is still competitive to the new state-of-the-art. One drawback of the method described in the first article is the aggregation approach of the visual modality which involves pooling per-frame features into a fixed-length vector. This ignores the temporal order inside the pooled segments. Recurrent Neural Networks (RNNs) are neural networks built for sequential processing of data, which can address this issue by summarizing frames in a real-valued state vector that is updated at each time-step. In general, RNNs provide a way of learning an aggregation approach in a data-driven manner. The second article analyzes the application of an RNN on CNN features for emotion recognition in video. A comparison of the RNN with the pooling-based approach shows a significant improvement in classification performance. It also includes a feature-level fusion and decision-level fusion of models for different modalities. In addition to the RNN, the same activity pipeline as previous work, an SVM-based audio model and the old aggregation model are fused to boost performance on the EmotiW 2015 challenge data set. This approach was the second runner-up in the challenge with a small margin of 1% in classification accuracy to the challenge winner. The last article focuses on a more general computer vision problem, namely visual tracking. An RNN is augmented with a neural attention mechanism that allows it to focus on task-related information, ignoring potential distractors in input frames. The approach is formulated in a modular neural framework consisting of three components: a recurrent attention module controlling where to look, a feature-extraction module providing a representation of what is seen and an objective module which indicates why an attentional behaviour is learned. Each module is fully differentiable allowing simple gradient-based optimization. Such a framework could be used to design an end-to-end solution for emotion recognition in vision, potentially not requiring initial steps of face detection or keypoint localization. The approach is tested on three tracking data sets including one real-world data set. In summary, this thesis explores and develops a multitude of deep learning techniques, making significant steps towards a long-term goal of building an end-to-end trainable systems for emotion recognition

PolyPublie

FACIAL EXPRESSION RECOGNITION DEFICITS IN AUTISM SPECTRUM DISORDER

Author: Friedman Zachary
Publication venue: Duquesne Scholarship Collection
Publication date: 16/12/2022
Field of study

Autism Spectrum Disorders (ASD) are an umbrella term for lifelong neurobehavioral disorders characterized by a set of social (verbal and nonverbal) communication challenges and behaviors and restricted, repetitive behaviors. Emotions serve many functions, but primarily they help with the appraisal of stimuli and driving of responses. Emotional processing and facial recognition are integral abilities that influence the acquisition of social skills. For individuals with ASD, it is hypothesized that facial recognition deficits contribute to social communication traits. The bulk of previously conducted research has utilized static images of facial expressions. This study utilized videos of spontaneous expressions. Participants were tasked with labeling facial expression valence. Neither a participants’ level of ASD severity or their age were significant predictors of facial expression valence labeling. Furthermore, neither independent variable, age or ASD severity level, had a significant impact on their overall accuracy of labeling facial expression valence. On average, videos of a happy facial expression were most correctly labeled, while sad faces on average were the most incorrectly labeled videos

Duquesne University: Digital Commons

State of the art of audio- and video based solutions for AAL

Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

VID:Open

Face Liveness Detection under Processed Image Attacks

Author: OMAR LUMA,QASSAM,ABEDALQADER
Publication venue
Publication date: 01/01/2018
Field of study

Face recognition is a mature and reliable technology for identifying people. Due to high-deﬁnition cameras and supporting devices, it is considered the fastest and the least intrusive biometric recognition modality. Nevertheless, eﬀective spooﬁng attempts on face recognition systems were found to be possible. As a result, various anti-spooﬁng algorithms were developed to counteract these attacks. They are commonly referred in the literature a liveness detection tests. In this research we highlight the eﬀectiveness of some simple, direct spooﬁng attacks, and test one of the current robust liveness detection algorithms, i.e. the logistic regression based face liveness detection from a single image, proposed by the Tan et al. in 2010, against malicious attacks using processed imposter images. In particular, we study experimentally the eﬀect of common image processing operations such as sharpening and smoothing, as well as corruption with salt and pepper noise, on the face liveness detection algorithm, and we ﬁnd that it is especially vulnerable against spooﬁng attempts using processed imposter images. We design and present a new facial database, the Durham Face Database, which is the ﬁrst, to the best of our knowledge, to have client, imposter as well as processed imposter images. Finally, we evaluate our claim on the eﬀectiveness of proposed imposter image attacks using transfer learning on Convolutional Neural Networks. We verify that such attacks are more diﬃcult to detect even when using high-end, expensive machine learning techniques

Durham e-Theses

Change blindness: eradication of gestalt strategies

Author: Goddard Paul
Wilson Steve
Publication venue: 'Pion Ltd'
Publication date: 01/08/2011
Field of study

Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

University of Lincoln Institutional Repository

State of the Art of Audio- and Video-Based Solutions for AAL

It is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters. Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals. Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely lifelogging and self-monitoring, remote monitoring of vital signs, emotional state recognition, food intake monitoring, activity and behaviour recognition, activity and personal assistance, gesture recognition, fall detection and prevention, mobility assessment and frailty recognition, and cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed

PhilPapers

OAR@UM

arXiv.org e-Print Archive

SSOAR - Social Science Open Access Repository

NORA - Norwegian Open Research Archives

VID:Open

KEER2022

Author
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2022
Field of study

Avanttítol: KEER2022. DiversitiesDescripció del recurs: 25 juliol 202

UPCommons. Portal del coneixement obert de la UPC

Infrared Thermography for the Assessment of Lumbar Sympathetic Blocks in Patients with Complex Regional Pain Syndrome

Author: Cañada Soriano Mar
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 21/03/2022
Field of study

[ES] El síndrome de dolor regional complejo (SDRC) es un trastorno de dolor crónico debilitante que suele afectar a una extremidad, y se caracteriza por su compleja e incomprendida fisiopatología subyacente, lo que supone un reto para su diagnóstico y tratamiento. Para evitar el deterioro de la calidad de vida de los pacientes, la consecución de un diagnóstico y tratamiento tempranos marca un punto de inflexión. Entre los diferentes tratamientos, los bloqueos simpáticos lumbares (BSLs) tienen como objetivo aliviar el dolor y reducir algunos signos simpáticos de la afección. Este procedimiento intervencionista se lleva a cabo inyectando anestesia local alrededor de los ganglios simpáticos y, hasta ahora, se realiza frecuentemente bajo el control de diferentes técnicas de imagen, como los ultrasonidos o la fluoroscopia. Dado que la termografía infrarroja (TIR) ha demostrado ser una herramienta eficaz para evaluar la temperatura de la piel, y teniendo en cuenta el efecto vasodilatador que presentan los anestésicos locales inyectados, se ha considerado el uso de la IRT para la evaluación de los BSLs. El objetivo de esta tesis es, estudiar la capacidad de la TIR como una técnica complementaria para la evaluación de la eficacia en la ejecución de los BSLs. Para cumplir este objetivo, se han realizado tres estudios implementando la TIR en pacientes diagnosticados de SDRC de miembros inferiores sometidos a BSLs. El primer estudio se centra en la viabilidad de la TIR como herramienta complementaria para la evaluación de la eficacia ejecución de los BSLs. Cuando se realizan los BSLs, la colocación correcta de la aguja es crítica para llevar realizar el procedimiento técnicamente correcto y, en consecuencia, para lograr los resultados clínicos deseados. Para verificar la posición de la aguja, tradicionalmente se han utilizado técnicas de imagen, sin embargo, los BSLs bajo control fluoroscópico no siempre aseguran su exacta ejecución. Por este motivo, se han aprovechado las alteraciones térmicas inducidas por los anestésicos locales y se han evaluado mediante la TIR. Así, cuando en las imágenes infrarrojas se observaron cambios térmicos en la planta del pie afectado tras la inyección de lidocaína, se consideró que el BSL era exitoso. El segundo estudio trata del análisis cuantitativo de los datos térmicos recogidos en el entorno clínico a partir de diferentes parámetros basados en las temperaturas extraídas de ambos pies. Según los resultados, para predecir adecuadamente los BSLs exitosos, se deberían analizar las temperaturas de las plantas de los pies durante los primeros cuatro minutos tras la inyección del anestésico local. Así, la aplicación de la TIR en el entorno clínico podría ser de gran ayuda para evaluar la eficacia de ejecución de los BSLs mediante la evaluación de las temperaturas de los pies en tiempo real. Por último, el tercer estudio aborda el análisis cuantitativo mediante la implementación de herramientas de machine learning (ML) para evaluar su capacidad de clasificar automáticamente los BSLs. En este estudio se han utilizado una serie de características térmicas extraídas de las imágenes infrarrojas para evaluar cuatro algoritmos de ML para tres momentos diferentes después del instante de referencia (inyección de lidocaína). Los resultados indican que los cuatro modelos evaluados presentan buenos rendimientos para clasificar automáticamente los BSLs entre exitosos y fallidos. Por lo tanto, la combinación de parámetros térmicos junto con de clasificación ML muestra ser eficaz para la clasificación automática de los procedimientos de BSLs. En conclusión, el uso de la TIR como técnica complementaria en la práctica clínica diaria para la evaluación de los BSLs ha demostrado ser totalmente eficaz. Dado que es un método objetivo y relativamente sencillo de implementar, puede permitir que los médicos especialistas en dolor identifiquen los bloqueos realizados fallidos y, en consecuencia, puedan revertir esta situación.[CA] La síndrome de dolor regional complex (SDRC) és un trastorn de dolor crònic debilitant que sol afectar una extremitat, i es caracteritza per la seua complexa i incompresa fisiopatologia subjacent, la qual cosa suposa un repte per al seu diagnòstic i tractament. Per a evitar la deterioració de la qualitat de vida dels pacients, la consecució d'un diagnòstic i tractament primerencs marca un punt d'inflexió. Entre els diferents tractaments , els bloquejos simpàtics lumbars (BSLs) tenen com a objectiu alleujar el dolor i reduir alguns signes simpàtics de l'afecció. Aquest procediment intervencionista es duu a terme injectant anestèsia local al voltant dels ganglis simpàtics i, fins ara, es realitza freqüentment sota el control de diferents tècniques d'imatge, com els ultrasons o la fluoroscopia. Atés que la termografia infraroja (TIR) ha demostrat ser una eina eficaç per a avaluar la temperatura de la pell, i tenint en compte l'efecte vasodilatador que presenten els anestèsics locals injectats, s'ha considerat l'ús de la TIR per a l'avaluació dels BSLs. L'objectiu d'aquesta tesi és, estudiar la capacitat de la TIR com una tècnica complementària per a l'avaluació de l'eficàcia en l'execució dels BSLs. Per a complir aquest objectiu, s'han realitzat tres estudis implementant la TIR en pacients diagnosticats de SDRC de membres inferiors sotmesos a BSLs. El primer estudi avalua la viabilitat de la TIR com a eina complementària per a l'analisi de l'eficàcia en l'execució dels BSLs. Quan es realitzen els BSLs, la col·locació correcta de l'agulla és crítica per a dur a terme el procediment tècnicament correcte i, en conseqüència, per a aconseguir els resultats clínics desitjats. Per a verificar la posició de l'agulla, tradicionalment s'han utilitzat tècniques d'imatge, no obstant això, els BSLs baix control fluoroscòpic no sempre asseguren la seua exacta execució. Per aquest motiu, s'han aprofitat les alteracions tèrmiques induïdes pels anestèsics locals i s'han avaluat mitjançant la TIR. Així, quan en les imatges infraroges es van observar canvis tèrmics en la planta del peu afectat després de la injecció de lidocaIna, es va considerar que el BSL era exitós. El segon estudi tracta de l'anàlisi quantitativa de les dades tèrmiques recollides en l'entorn clínic a partir de diferents paràmetres basats en les temperatures extretes d'ambdós peus. Segons els resultats, per a predir adequadament l'execució exitosa d'un BSL, s'haurien d'analitzar les temperatures de les plantes dels peus durant els primers quatre minuts després de la injecció de l'anestèsic local. Així, l'implementació de la TIR en l'entorn clínic podria ser de gran ajuda per a avaluar l'eficàcia d'execució dels BSLs mitjançant l'avaluació de les temperatures dels peus en temps real. El tercer estudi aborda l'anàlisi quantitativa mitjançant la implementació d'eines machine learning (ML) per a avaluar la seua capacitat de classificar automàticament els BSLs. En aquest estudi s'han utilitzat una sèrie de característiques tèrmiques extretes de les imatges infraroges per a avaluar quatre algorismes de ML per a tres moments diferents després de l'instant de referència (injecció de lidocaïna). Els resultats indiquen que els quatre models avaluats presenten bons rendiments per a classificar automàticament els BSLs en exitosos i fallits. Per tant, la combinació de paràmetres tèrmics juntament amb models de classificació ML mostra ser eficaç per a la classificació automàtica dels procediments de BSLs. En conclusió, l'ús de la TIR com a tècnica complementària en la pràctica clínica diària per a l'avaluació dels BSLs ha demostrat ser totalment eficaç. Atés que és un mètode objectiu i relativament senzill d'implementar, pot ajudar els metges especialistes en dolor a identificar els bloquejos realitzats fallits i, en conseqüència, puguen revertir aquesta situació.[EN] Complex regional pain syndrome (CRPS) is a debilitating chronic pain condition that usually affects one limb, and it is characterized by its misunderstood underlying pathophysiology, resulting in both challenging diagnosis and treatment. To avoid the patients' impairment quality of life, the achievement of both an early diagnosis and treatment marks a turning point. Among the different treatment approaches, lumbar sympathetic blocks (LSBs) are addressed to alleviate the pain and reduce some sympathetic signs of the condition. This interventional procedure is performed by injecting local anaesthetic around the sympathetic ganglia and, until now, it has been performed under different imaging techniques, including the ultrasound or the fluoroscopy approaches. Since infrared thermography (IRT) has proven to be a powerful tool to evaluate skin temperatures and taking into account the vasodilatory effects of the local anaesthetics injected in the LSB, the use of IRT has been considered for the LSBs assessment. Therefore, the purpose of this thesis is to evaluate the capability of IRT as a complementary assessment technique for the LSBs procedures performance. To fulfil this aim, three studies have been conducted implementing the IRT in patients diagnosed with lower limbs CRPS undergoing LSBs. The first study focuses on the feasibility of IRT as a complementary assessment tool for LSBs performance, that is, for the confirmation of the proper needle position. When LSBs are performed, the correct needle placement is critical to carry out the procedure technically correct and, consequently, to achieve the desired clinical outcomes. To verify the needle placement position, imaging techniques have traditionally been used, however, LSBs under radioscopic guidance do not always ensure an exact performance. For this reason, the thermal alterations induced by the local anaesthetics, have been exploited and assessed by means of IRT. Thus, the LSB procedure was considered successfully performed when thermal changes within the affected plantar foot were observed in the infrared images after the lidocaine injection. The second study deals with the quantitative analysis of the thermal data collected in the clinical setting through the evaluation of different temperature-based parameters extracted from both feet. According to the results, the proper LSB success prediction could be achieved in the first four minutes after the block through the evaluation of the feet skin temperatures. Therefore, the implementation of IRT in the clinical setting might be of great help in assessing the LSBs performance by evaluating the plantar feet temperatures in real time. Finally, the third study addresses the quantitative analysis by implementing machine learning (ML) tools to assess their capability to automatically classify LSBs. In this study, a set of thermal features retrieved from the infrared images have been used to evaluate four ML algorithms for three different moments after the baseline time (lidocaine injection). The results indicate that all four models evaluated present good performance metrics to automatically classify LSBs into successful and failed. Therefore, combining infrared features with ML classification models shows to be effective for the LSBs procedures automatic classification. In conclusion, the use of IRT as a complementary technique in daily clinical practice for LSBs assessment has been evidenced entirely effective. Since IRT is an objective method and it is not very demanding to perform, it is of great help for pain physicians to identify failed procedures, and consequently, it allow them to reverse this situation.Cañada Soriano, M. (2022). Infrared Thermography for the Assessment of Lumbar Sympathetic Blocks in Patients with Complex Regional Pain Syndrome [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181699TESI

RiuNet