35 research outputs found

    Learning Biosignals with Deep Learning

    Get PDF
    The healthcare system, which is ubiquitously recognized as one of the most influential system in society, is facing new challenges since the start of the decade.The myriad of physiological data generated by individuals, namely in the healthcare system, is generating a burden on physicians, losing effectiveness on the collection of patient data. Information systems and, in particular, novel deep learning (DL) algorithms have been prompting a way to take this problem. This thesis has the aim to have an impact in biosignal research and industry by presenting DL solutions that could empower this field. For this purpose an extensive study of how to incorporate and implement Convolutional Neural Networks (CNN), Recursive Neural Networks (RNN) and Fully Connected Networks in biosignal studies is discussed. Different architecture configurations were explored for signal processing and decision making and were implemented in three different scenarios: (1) Biosignal learning and synthesis; (2) Electrocardiogram (ECG) biometric systems, and; (3) Electrocardiogram (ECG) anomaly detection systems. In (1) a RNN-based architecture was able to replicate autonomously three types of biosignals with a high degree of confidence. As for (2) three CNN-based architectures, and a RNN-based architecture (same used in (1)) were used for both biometric identification, reaching values above 90% for electrode-base datasets (Fantasia, ECG-ID and MIT-BIH) and 75% for off-person dataset (CYBHi), and biometric authentication, achieving Equal Error Rates (EER) of near 0% for Fantasia and MIT-BIH and bellow 4% for CYBHi. As for (3) the abstraction of healthy clean the ECG signal and detection of its deviation was made and tested in two different scenarios: presence of noise using autoencoder and fully-connected network (reaching 99% accuracy for binary classification and 71% for multi-class), and; arrhythmia events by including a RNN to the previous architecture (57% accuracy and 61% sensitivity). In sum, these systems are shown to be capable of producing novel results. The incorporation of several AI systems into one could provide to be the next generation of preventive medicine, as the machines have access to different physiological and anatomical states, it could produce more informed solutions for the issues that one may face in the future increasing the performance of autonomous preventing systems that could be used in every-day life in remote places where the access to medicine is limited. These systems will also help the study of the signal behaviour and how they are made in real life context as explainable AI could trigger this perception and link the inner states of a network with the biological traits.O sistema de saúde, que é ubiquamente reconhecido como um dos sistemas mais influentes da sociedade, enfrenta novos desafios desde o ínicio da década. A miríade de dados fisiológicos gerados por indíviduos, nomeadamente no sistema de saúde, está a gerar um fardo para os médicos, perdendo a eficiência no conjunto dos dados do paciente. Os sistemas de informação e, mais espcificamente, da inovação de algoritmos de aprendizagem profunda (DL) têm sido usados na procura de uma solução para este problema. Esta tese tem o objetivo de ter um impacto na pesquisa e na indústria de biosinais, apresentando soluções de DL que poderiam melhorar esta área de investigação. Para esse fim, é discutido um extenso estudo de como incorporar e implementar redes neurais convolucionais (CNN), redes neurais recursivas (RNN) e redes totalmente conectadas para o estudo de biosinais. Diferentes arquiteturas foram exploradas para processamento e tomada de decisão de sinais e foram implementadas em três cenários diferentes: (1) Aprendizagem e síntese de biosinais; (2) sistemas biométricos com o uso de eletrocardiograma (ECG), e; (3) Sistema de detecção de anomalias no ECG. Em (1) uma arquitetura baseada na RNN foi capaz de replicar autonomamente três tipos de sinais biológicos com um alto grau de confiança. Quanto a (2) três arquiteturas baseadas em CNN e uma arquitetura baseada em RNN (a mesma usada em (1)) foram usadas para ambas as identificações, atingindo valores acima de 90 % para conjuntos de dados à base de eletrodos (Fantasia, ECG-ID e MIT -BIH) e 75 % para o conjunto de dados fora da pessoa (CYBHi) e autenticação, atingindo taxas de erro iguais (EER) de quase 0 % para Fantasia e MIT-BIH e abaixo de 4 % para CYBHi. Quanto a (3) a abstração de sinais limpos e assimptomáticos de ECG e a detecção do seu desvio foram feitas e testadas em dois cenários diferentes: na presença de ruído usando um autocodificador e uma rede totalmente conectada (atingindo 99 % de precisão na classificação binária e 71 % na multi-classe), e; eventos de arritmia incluindo um RNN na arquitetura anterior (57 % de precisão e 61 % de sensibilidade). Em suma, esses sistemas são mais uma vez demonstrados como capazes de produzir resultados inovadores. A incorporação de vários sistemas de inteligência artificial em um unico sistema pederá desencadear a próxima geração de medicina preventiva. Os algoritmos ao terem acesso a diferentes estados fisiológicos e anatómicos, podem produzir soluções mais informadas para os problemas que se possam enfrentar no futuro, aumentando o desempenho de sistemas autónomos de prevenção que poderiam ser usados na vida quotidiana, nomeadamente em locais remotos onde o acesso à medicinas é limitado. Estes sistemas também ajudarão o estudo do comportamento do sinal e como eles são feitos no contexto da vida real, pois a IA explicável pode desencadear essa percepção e vincular os estados internos de uma rede às características biológicas

    Modelling person-specific and multi-scale facial dynamics for automatic personality and depression analysis

    Get PDF
    ‘To know oneself is true progress’. While one's identity is difficult to be fully described, a key part of it is one’s personality. Accurately understanding personality can benefit various aspects of human's life. There is convergent evidence suggesting that personality traits are marked by non-verbal facial expressions of emotions, which in theory means that automatic personality assessment is possible from facial behaviours. Thus, this thesis aims to develop video-based automatic personality analysis approaches. Specifically, two video-level dynamic facial behaviour representations are proposed for automatic personality traits estimation, namely person-specific representation and spectral representation, which focus on addressing three issues that have been frequently occurred in existing automatic personality analysis approaches: 1. attempting to use super short video segments or even a single frame to infer personality traits; 2. lack of proper way to retain multi-scale long-term temporal information; 3. lack of methods to encode person-specific facial dynamics that are relatively stable over time but differ across individuals. This thesis starts with extending the dynamic image algorithm to modeling preceding and succeeding short-term face dynamics of each frame in a video, which achieved good performance in estimating valence/arousal intensities, showing good dynamic encoding ability of such dynamic representation. This thesis then proposes a novel Rank Loss, aiming to train a network that produces similar dynamic representation per-frame but only from a still image. This way, the network can learn generic facial dynamics from unlabelled face videos in a self-supervised manner. Based on such an approach, the person-specific representation encoding approach is proposed. It firstly freezes the well-trained generic network, and incorporates a set of intermediate filters, which are trained again but with only person-specific videos based on the same self-supervised learning approach. As a result, the learned filters' weights are person-specific, and can be concatenated as a 1-D video-level person-specific representation. Meanwhile, this thesis also proposes a spectral analysis approach to retain multi-scale video-level facial dynamics. This approach uses automatically detected human behaviour primitives as the low-dimensional descriptor for each frame, and converts long and variable-length time-series behaviour signals to small and length-independent spectral representations to represent video-level multi-scale temporal dynamics of expressive behaviours. Consequently, the combination of two representations, which contains not only multi-scale video-level facial dynamics but also person-specific video-level facial dynamics, can be applied to automatic personality estimation. This thesis conducts a series of experiments to validate the proposed approaches: 1. the arousal/valence intensity estimation is conducted on both a controlled face video dataset (SEMAINE) and a wild face video dataset (Affwild-2), to evaluate the dynamic encoding capability of the proposed Rank Loss; 2. the proposed automatic personality traits recognition systems (spectral representation and person-specific representation) are evaluated on face video datasets that labelled with either 'Big-Five' apparent personality traits (ChaLearn) or self-reported personality traits (VHQ); 3. the depression studies are also evaluated on the VHQ dataset that is labelled with PHQ-9 depression scores. The experimental results on automatic personality traits and depression severity estimation tasks show the person-specific representation's good performance in personality task and spectral vector's superior performance in depression task. In particular, the proposed person-specific approach achieved a similar performance to the state-of-the-art method in apparent personality traits recognition task and achieved at least 15% PCC improvements over other approaches in self-reported personality traits recognition task. Meanwhile, the proposed spectral representation shows better performance than the person-specific approach in depression severity estimation task. In addition, this thesis also found that adding personality traits labels/predictions into behaviour descriptors improved depression severity estimation results

    Healthcare data heterogeneity and its contribution to machine learning performance

    Full text link
    Tesis por compendio[EN] The data quality assessment has many dimensions, from those so obvious as the data completeness and consistency to other less evident such as the correctness or the ability to represent the target population. In general, it is possible to classify them as those produced by an external effect, and those that are inherent in the data itself. This work will be focused on those inherent to data, such as the temporal and the multisource variability applied to healthcare data repositories. Every process is usually improved over time, and that has a direct impact on the data distribution. Similarly, how a process is executed in different sources may vary due to many factors, such as the diverse interpretation of standard protocols by human beings or different previous experiences of experts. Artificial Intelligence has become one of the most widely extended technological paradigms in almost all the scientific and industrial fields. Advances not only in models but also in hardware have led to their use in almost all areas of science. Although the solved problems using this technology often have the drawback of not being interpretable, or at least not as much as other classical mathematical or statistical techniques. This motivated the emergence of the "explainable artificial intelligence" concept, that study methods to quantify and visualize the training process of models based on machine learning. On the other hand, real systems may often be represented by large networks (graphs), and one of the most relevant features in such networks is the community or clustering structure. Since sociology, biology, or clinical situations could usually be modeled using graphs, community detection algorithms are becoming more and more extended in a biomedical field. In the present doctoral thesis, contributions have been made in the three above mentioned areas. On the one hand, temporal and multisource variability assessment methods based on information geometry were used to detect variability in data distribution that may hinder data reuse and, hence, the conclusions which can be extracted from them. This methodology's usability was proved by a temporal variability analysis to detect data anomalies in the electronic health records of a hospital over 7 years. Besides, it showed that this methodology could have a positive impact if it applied previously to any study. To this end, firstly, we extracted the variables that highest influenced the intensity of headache in migraine patients using machine learning techniques. One of the principal characteristics of machine learning algorithms is its capability of fitting the training set. In those datasets with a small number of observations, the model can be biased by the training sample. The observed variability, after the application of the mentioned methodology and considering as sources the registries of migraine patients with different headache intensity, served as evidence for the truthfulness of the extracted features. Secondly, such an approach was applied to measure the variability among the gray-level histograms of digital mammographies. We demonstrated that the acquisition device produced the observed variability, and after defining an image preprocessing step, the performance of a deep learning model, which modeled a marker of breast cancer risk estimation, increased. Given a dataset containing the answers to a survey formed by psychometric scales, or in other words, questionnaires to measure psychologic factors, such as depression, cope, etcetera, two deep learning architectures that used the data structure were defined. Firstly, we designed a deep learning architecture using the conceptual structure of such psychometric scales. This architecture was trained to model the happiness degree of the participants, improved the performance compared to classical statistical approaches. A second architecture, automatically designed using community detection in graphs, was not only a contribution[ES] El análisis de la calidad de los datos abarca muchas dimensiones, desde aquellas tan obvias como la completitud y la coherencia, hasta otras menos evidentes como la correctitud o la capacidad de representar a la población objetivo. En general, es posible clasificar estas dimensiones como las producidas por un efecto externo y las que son inherentes a los propios datos. Este trabajo se centrará en la evaluación de aquellas inherentes a los datos en repositorios de datos sanitarios, como son la variabilidad temporal y multi-fuente. Los procesos suelen evolucionar con el tiempo, y esto tiene un impacto directo en la distribución de los datos. Análogamente, la subjetividad humana puede influir en la forma en la que un mismo proceso, se ejecuta en diferentes fuentes de datos, influyendo en su cuantificación o recogida. La inteligencia artificial se ha convertido en uno de los paradigmas tecnológicos más extendidos en casi todos los campos científicos e industriales. Los avances, no sólo en los modelos sino también en el hardware, han llevado a su uso en casi todas las áreas de la ciencia. Es cierto que, los problemas resueltos mediante esta tecnología, suelen tener el inconveniente de no ser interpretables, o al menos, no tanto como otras técnicas de matemáticas o de estadística clásica. Esta falta de interpretabilidad, motivó la aparición del concepto de "inteligencia artificial explicable", que estudia métodos para cuantificar y visualizar el proceso de entrenamiento de modelos basados en aprendizaje automático. Por otra parte, los sistemas reales pueden representarse a menudo mediante grandes redes (grafos), y una de las características más relevantes de esas redes, es la estructura de comunidades. Dado que la sociología, la biología o las situaciones clínicas, usualmente pueden modelarse mediante grafos, los algoritmos de detección de comunidades se están extendiendo cada vez más en el ámbito biomédico. En la presente tesis doctoral, se han hecho contribuciones en los tres campos anteriormente mencionados. Por una parte, se han utilizado métodos de evaluación de variabilidad temporal y multi-fuente, basados en geometría de la información, para detectar la variabilidad en la distribución de los datos que pueda dificultar la reutilización de los mismos y, por tanto, las conclusiones que se puedan extraer. Esta metodología demostró ser útil tras ser aplicada a los registros electrónicos sanitarios de un hospital a lo largo de 7 años, donde se detectaron varias anomalías. Además, se demostró el impacto positivo que este análisis podría añadir a cualquier estudio. Para ello, en primer lugar, se utilizaron técnicas de aprendizaje automático para extraer las características más relevantes, a la hora de clasificar la intensidad del dolor de cabeza en pacientes con migraña. Una de las propiedades de los algoritmos de aprendizaje automático es su capacidad de adaptación a los datos de entrenamiento, en bases de datos en los que el número de observaciones es pequeño, el estimador puede estar sesgado por la muestra de entrenamiento. La variabilidad observada, tras la utilización de la metodología y considerando como fuentes, los registros de los pacientes con diferente intensidad del dolor, sirvió como evidencia de la veracidad de las características extraídas. En segundo lugar, se aplicó para medir la variabilidad entre los histogramas de los niveles de gris de mamografías digitales. Se demostró que esta variabilidad estaba producida por el dispositivo de adquisición, y tras la definición de un preproceso de imagen, se mejoró el rendimiento de un modelo de aprendizaje profundo, capaz de estimar un marcador de imagen del riesgo de desarrollar cáncer de mama. Dada una base de datos que recogía las respuestas de una encuesta formada por escalas psicométricas, o lo que es lo mismo cuestionarios que sirven para medir un factor psicológico, tales como depresión, resiliencia, etc., se definieron nuevas arquitecturas de aprendizaje profundo utilizando la estructura de los datos. En primer lugar, se dise˜no una arquitectura, utilizando la estructura conceptual de las citadas escalas psicom´etricas. Dicha arquitectura, que trataba de modelar el grado de felicidad de los participantes, tras ser entrenada, mejor o la precisión en comparación con otros modelos basados en estadística clásica. Una segunda aproximación, en la que la arquitectura se diseño de manera automática empleando detección de comunidades en grafos, no solo fue una contribución de por sí por la automatización del proceso, sino que, además, obtuvo resultados comparables a su predecesora.[CA] L'anàlisi de la qualitat de les dades comprén moltes dimensions, des d'aquelles tan òbvies com la completesa i la coherència, fins a altres menys evidents com la correctitud o la capacitat de representar a la població objectiu. En general, és possible classificar estes dimensions com les produïdes per un efecte extern i les que són inherents a les pròpies dades. Este treball se centrarà en l'avaluació d'aquelles inherents a les dades en reposadors de dades sanitaris, com són la variabilitat temporal i multi-font. Els processos solen evolucionar amb el temps i açò té un impacte directe en la distribució de les dades. Anàlogament, la subjectivitat humana pot influir en la forma en què un mateix procés, s'executa en diferents fonts de dades, influint en la seua quantificació o arreplega. La intel·ligència artificial s'ha convertit en un dels paradigmes tecnològics més estesos en quasi tots els camps científics i industrials. Els avanços, no sols en els models sinó també en el maquinari, han portat al seu ús en quasi totes les àrees de la ciència. És cert que els problemes resolts per mitjà d'esta tecnologia, solen tindre l'inconvenient de no ser interpretables, o almenys, no tant com altres tècniques de matemàtiques o d'estadística clàssica. Esta falta d'interpretabilitat, va motivar l'aparició del concepte de "inteligencia artificial explicable", que estudia mètodes per a quantificar i visualitzar el procés d'entrenament de models basats en aprenentatge automàtic. D'altra banda, els sistemes reals poden representar-se sovint per mitjà de grans xarxes (grafs) i una de les característiques més rellevants d'eixes xarxes, és l'estructura de comunitats. Atés que la sociologia, la biologia o les situacions clíniques, poden modelar-se usualment per mitjà de grafs, els algoritmes de detecció de comunitats s'estan estenent cada vegada més en l'àmbit biomèdic. En la present tesi doctoral, s'han fet contribucions en els tres camps anteriorment mencionats. D'una banda, s'han utilitzat mètodes d'avaluació de variabilitat temporal i multi-font, basats en geometria de la informació, per a detectar la variabilitat en la distribució de les dades que puga dificultar la reutilització dels mateixos i, per tant, les conclusions que es puguen extraure. Esta metodologia va demostrar ser útil després de ser aplicada als registres electrònics sanitaris d'un hospital al llarg de 7 anys, on es van detectar diverses anomalies. A més, es va demostrar l'impacte positiu que esta anàlisi podria afegir a qualsevol estudi. Per a això, en primer lloc, es van utilitzar tècniques d'aprenentatge automàtic per a extraure les característiques més rellevants, a l'hora de classificar la intensitat del mal de cap en pacients amb migranya. Una de les propietats dels algoritmes d'aprenentatge automàtic és la seua capacitat d'adaptació a les dades d'entrenament, en bases de dades en què el nombre d'observacions és xicotet, l'estimador pot estar esbiaixat per la mostra d'entrenament. La variabilitat observada després de la utilització de la metodologia, i considerant com a fonts els registres dels pacients amb diferent intensitat del dolor, va servir com a evidència de la veracitat de les característiques extretes. En segon lloc, es va aplicar per a mesurar la variabilitat entre els histogrames dels nivells de gris de mamografies digitals. Es va demostrar que esta variabilitat estava produïda pel dispositiu d'adquisició i després de la definició d'un preprocés d'imatge, es va millorar el rendiment d'un model d'aprenentatge profund, capaç d'estimar un marcador d'imatge del risc de desenrotllar càncer de mama. Donada una base de dades que arreplegava les respostes d'una enquesta formada per escales psicomètriques, o el que és el mateix qüestionaris que servixen per a mesurar un factor psicològic, com ara depressió, resiliència, etc., es van definir noves arquitectures d'aprenentatge profund utilitzant l’estructura de les dades. En primer lloc, es disseny`a una arquitectura, utilitzant l’estructura conceptual de les esmentades escales psicom`etriques. La dita arquitectura, que tractava de modelar el grau de felicitat dels participants, despr´es de ser entrenada, va millorar la precisió en comparació amb altres models basats en estad´ıstica cl`assica. Una segona aproximació, en la que l’arquitectura es va dissenyar de manera autoàtica emprant detecció de comunitats en grafs, no sols va ser una contribució de per si per l’automatització del procés, sinó que, a més, va obtindre resultats comparables a la seua predecessora.También me gustaría mencionar al Instituto Tecnológico de la Informáica, en especial al grupo de investigación Percepción, Reconocimiento, Aprendizaje e Inteligencia Artificial, no solo por darme la oportunidad de seguir creciendo en el mundo de la ciencia, sino también, por apoyarme en la consecución de mis objetivos personalesPérez Benito, FJ. (2020). Healthcare data heterogeneity and its contribution to machine learning performance [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/154414TESISCompendi

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    Modelling person-specific and multi-scale facial dynamics for automatic personality and depression analysis

    Get PDF
    ‘To know oneself is true progress’. While one's identity is difficult to be fully described, a key part of it is one’s personality. Accurately understanding personality can benefit various aspects of human's life. There is convergent evidence suggesting that personality traits are marked by non-verbal facial expressions of emotions, which in theory means that automatic personality assessment is possible from facial behaviours. Thus, this thesis aims to develop video-based automatic personality analysis approaches. Specifically, two video-level dynamic facial behaviour representations are proposed for automatic personality traits estimation, namely person-specific representation and spectral representation, which focus on addressing three issues that have been frequently occurred in existing automatic personality analysis approaches: 1. attempting to use super short video segments or even a single frame to infer personality traits; 2. lack of proper way to retain multi-scale long-term temporal information; 3. lack of methods to encode person-specific facial dynamics that are relatively stable over time but differ across individuals. This thesis starts with extending the dynamic image algorithm to modeling preceding and succeeding short-term face dynamics of each frame in a video, which achieved good performance in estimating valence/arousal intensities, showing good dynamic encoding ability of such dynamic representation. This thesis then proposes a novel Rank Loss, aiming to train a network that produces similar dynamic representation per-frame but only from a still image. This way, the network can learn generic facial dynamics from unlabelled face videos in a self-supervised manner. Based on such an approach, the person-specific representation encoding approach is proposed. It firstly freezes the well-trained generic network, and incorporates a set of intermediate filters, which are trained again but with only person-specific videos based on the same self-supervised learning approach. As a result, the learned filters' weights are person-specific, and can be concatenated as a 1-D video-level person-specific representation. Meanwhile, this thesis also proposes a spectral analysis approach to retain multi-scale video-level facial dynamics. This approach uses automatically detected human behaviour primitives as the low-dimensional descriptor for each frame, and converts long and variable-length time-series behaviour signals to small and length-independent spectral representations to represent video-level multi-scale temporal dynamics of expressive behaviours. Consequently, the combination of two representations, which contains not only multi-scale video-level facial dynamics but also person-specific video-level facial dynamics, can be applied to automatic personality estimation. This thesis conducts a series of experiments to validate the proposed approaches: 1. the arousal/valence intensity estimation is conducted on both a controlled face video dataset (SEMAINE) and a wild face video dataset (Affwild-2), to evaluate the dynamic encoding capability of the proposed Rank Loss; 2. the proposed automatic personality traits recognition systems (spectral representation and person-specific representation) are evaluated on face video datasets that labelled with either 'Big-Five' apparent personality traits (ChaLearn) or self-reported personality traits (VHQ); 3. the depression studies are also evaluated on the VHQ dataset that is labelled with PHQ-9 depression scores. The experimental results on automatic personality traits and depression severity estimation tasks show the person-specific representation's good performance in personality task and spectral vector's superior performance in depression task. In particular, the proposed person-specific approach achieved a similar performance to the state-of-the-art method in apparent personality traits recognition task and achieved at least 15% PCC improvements over other approaches in self-reported personality traits recognition task. Meanwhile, the proposed spectral representation shows better performance than the person-specific approach in depression severity estimation task. In addition, this thesis also found that adding personality traits labels/predictions into behaviour descriptors improved depression severity estimation results

    Deep Active Learning Explored Across Diverse Label Spaces

    Get PDF
    abstract: Deep learning architectures have been widely explored in computer vision and have depicted commendable performance in a variety of applications. A fundamental challenge in training deep networks is the requirement of large amounts of labeled training data. While gathering large quantities of unlabeled data is cheap and easy, annotating the data is an expensive process in terms of time, labor and human expertise. Thus, developing algorithms that minimize the human effort in training deep models is of immense practical importance. Active learning algorithms automatically identify salient and exemplar samples from large amounts of unlabeled data and can augment maximal information to supervised learning models, thereby reducing the human annotation effort in training machine learning models. The goal of this dissertation is to fuse ideas from deep learning and active learning and design novel deep active learning algorithms. The proposed learning methodologies explore diverse label spaces to solve different computer vision applications. Three major contributions have emerged from this work; (i) a deep active framework for multi-class image classication, (ii) a deep active model with and without label correlation for multi-label image classi- cation and (iii) a deep active paradigm for regression. Extensive empirical studies on a variety of multi-class, multi-label and regression vision datasets corroborate the potential of the proposed methods for real-world applications. Additional contributions include: (i) a multimodal emotion database consisting of recordings of facial expressions, body gestures, vocal expressions and physiological signals of actors enacting various emotions, (ii) four multimodal deep belief network models and (iii) an in-depth analysis of the effect of transfer of multimodal emotion features between source and target networks on classification accuracy and training time. These related contributions help comprehend the challenges involved in training deep learning models and motivate the main goal of this dissertation.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    {3D} Morphable Face Models -- Past, Present and Future

    No full text
    In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications

    Sense and Respond

    Get PDF
    Over the past century, the manufacturing industry has undergone a number of paradigm shifts: from the Ford assembly line (1900s) and its focus on efficiency to the Toyota production system (1960s) and its focus on effectiveness and JIDOKA; from flexible manufacturing (1980s) to reconfigurable manufacturing (1990s) (both following the trend of mass customization); and from agent-based manufacturing (2000s) to cloud manufacturing (2010s) (both deploying the value stream complexity into the material and information flow, respectively). The next natural evolutionary step is to provide value by creating industrial cyber-physical assets with human-like intelligence. This will only be possible by further integrating strategic smart sensor technology into the manufacturing cyber-physical value creating processes in which industrial equipment is monitored and controlled for analyzing compression, temperature, moisture, vibrations, and performance. For instance, in the new wave of the ‘Industrial Internet of Things’ (IIoT), smart sensors will enable the development of new applications by interconnecting software, machines, and humans throughout the manufacturing process, thus enabling suppliers and manufacturers to rapidly respond to changing standards. This reprint of “Sense and Respond” aims to cover recent developments in the field of industrial applications, especially smart sensor technologies that increase the productivity, quality, reliability, and safety of industrial cyber-physical value-creating processes

    Artificial Intelligence Applied to Facial Image Analysis and Feature Measurement

    Get PDF
    Beauty has always played an essential part in society, influencing both everyday human interactions and more significant aspects such as mate selection. The continued and expanding use of beauty products by women and, increasingly, men worldwide has prompted and motivated several companies to develop platforms that effectively integrate into the beauty and cosmetics sector. They attempt to improve the customer experience by combining data with personalisation. Global cosmetics spending is worth billions of dollars, and most of it is wasted on unsuitable or incompatible products. This enables artificial intelligence to alter the rules using computer vision and deep learning approaches, allowing customers to be completely satisfied. With the advanced feature extraction in deep learning, especially convolutional neural networks, automatic facial feature analysis from images for the sake of beauty and beautification has become an emerging subject of study. Scholars studying facial aesthetics have recently made breakthroughs in the areas of facial shape beautification and beauty prediction. In the cosmetics sector, a new line of recommendation system research has arisen. Users benefit from recommendation systems since these systems help them narrow down their options. This thesis has laid the groundwork for a recommendation system related to beautification purposes through hairstyle and eyelashes leveraging artificial intelligence techniques. One of the most potent descriptors for attribution of personality is facial attributes. Various types of facial attributes are extracted in this thesis, including geometrical, automatic and hand-crafted features. The extracted attributes provide rich information for the recommendation system to produce the final outcome. The coexistence of external effects on the faces, like makeup or retouching, could disguise facial features. This might result in degradation in the performance of facial feature extraction and subsequently in the recommendation system. Thus, three methods are further developed to detect the faces wearing the makeup before passing the images into the recommendation system. This would help to provide more reliable and accurate feature extraction and suggest more suitable recommendation results. This thesis also presents a method for segmenting the facial region with the goal of extending the developed recommendation system by incorporating a synthesised hairstyle virtually on the facial region, thereby harnessing the recommended hairstyle generated by our developed system. Hence, the work presented in this thesis shows the benefits of implementing computational intelligence methods in the beauty and cosmetics sector. It also demonstrates that computational intelligence techniques have redefined the notion of beauty and how the consumer communicates with these emerging intelligent facilities that bring solutions to our fingertips
    corecore