13 research outputs found

    Automated screening methods for mental and neuro-developmental disorders

    Get PDF
    Mental and neuro-developmental disorders such as depression, bipolar disorder, and autism spectrum disorder (ASD) are critical healthcare issues which affect a large number of people. Depression, according to the World Health Organisation, is the largest cause of disability worldwide and affects more than 300 million people. Bipolar disorder affects more than 60 million individuals worldwide. ASD, meanwhile, affects more than 1 in 100 people in the UK. Not only do these disorders adversely affect the quality of life of affected individuals, they also have a significant economic impact. While brute-force approaches are potentially useful for learning new features which could be representative of these disorders, such approaches may not be best suited for developing robust screening methods. This is due to a myriad of confounding factors, such as the age, gender, cultural background, and socio-economic status, which can affect social signals of individuals in a similar way as the symptoms of these disorders. Brute-force approaches may learn to exploit effects of these confounding factors on social signals in place of effects due to mental and neuro-developmental disorders. The main objective of this thesis is to develop, investigate, and propose computational methods to screen for mental and neuro-developmental disorders in accordance with descriptions given in the Diagnostic and Statistical Manual (DSM). The DSM manual is a guidebook published by the American Psychiatric Association which offers common language on mental disorders. Our motivation is to alleviate, to an extent, the possibility of machine learning algorithms picking up one of the confounding factors to optimise performance for the dataset – something which we do not find uncommon in research literature. To this end, we introduce three new methods for automated screening for depression from audio/visual recordings, namely: turbulence features, craniofacial movement features, and Fisher Vector based representation of speech spectra. We surmise that psychomotor changes due to depression lead to uniqueness in an individual's speech pattern which manifest as sudden and erratic changes in speech feature contours. The efficacy of these features is demonstrated as part of our solution to Audio/Visual Emotion Challenge 2017 (AVEC 2017) on Depression severity prediction. We also detail a methodology to quantify specific craniofacial movements, which we hypothesised could be indicative of psychomotor retardation, and hence depression. The efficacy of craniofacial movement features is demonstrated using datasets from the 2014 and 2017 editions of AVEC Depression severity prediction challenges. Finally, using the dataset provided as part of AVEC 2016 Depression classification challenge, we demonstrate that differences between speech of individuals with and without depression can be quantified effectively using the Fisher Vector representation of speech spectra. For our work on automated screening of bipolar disorder, we propose methods to classify individuals with bipolar disorder into states of remission, hypo-mania, and mania. Here, we surmise that like depression, individuals with different levels of mania have certain uniqueness to their social signals. Based on this understanding, we propose the use of turbulence features for audio/visual social signals (i.e. speech and facial expressions). We also propose the use of Fisher Vectors to create a unified representation of speech in terms of prosody, voice quality, and speech spectra. These methods have been proposed as part of our solution to the AVEC 2018 Bipolar disorder challenge. In addition, we find that the task of automated screening for ASD is much more complicated. Here, confounding factors can easily overwhelm socials signals which are affected by ASD. We discuss, in the light of research literature and our experimental analysis, that significant collaborative work is required between computer scientists and clinicians to discern social signals which are robust to common confounding factors

    Affective Speech Recognition

    Get PDF
    Speech, as a medium of interaction, carries two different streams of information. Whereas one stream carries explicit messages, the other one contains implicit information about speakers themselves. Affective speech recognition is a set of theories and tools that intend to automate unfolding the part of the implicit stream that has to do with humans emotion. Application of affective speech recognition is to human computer interaction; a machine that is able to recognize humans emotion could engage the user in a more effective interaction. This thesis proposes a set of analyses and methodologies that advance automatic recognition of affect from speech. The proposed solution spans two dimensions of the problem: speech signal processing, and statistical learning. At the speech signal processing dimension, extraction of speech low-level descriptors is dis- cussed, and a set of descriptors that exploit the spectrum of the signal are proposed, which have shown to be particularly practical for capturing affective qualities of speech. Moreover, consider- ing the non-stationary property of the speech signal, further proposed is a measure of dynamicity that captures that property of speech by quantifying changes of the signal over time. Furthermore, based on the proposed set of low-level descriptors, it is shown that individual human beings are different in conveying emotions, and that parts of the spectrum that hold the affective information are different from one person to another. Therefore, the concept of emotion profile is proposed that formalizes those differences by taking into account different factors such as cultural and gender-specific differences, as well as those distinctions that have to do with individual human beings. At the statistical learning dimension, variable selection is performed to identify speech features that are most imperative to extracting affective information. In doing so, low-level descriptors are distinguished from statistical functionals, therefore, effectiveness of each of the two are studied dependently and independently. The major importance of variable selection as a standalone component of a solution is to real-time application of affective speech recognition. Although thousands of speech features are commonly used to tackle this problem in theory, extracting that many features in a real-time manner is unrealistic, especially for mobile applications. Results of the conducted investigations show that the required number of speech features is far less than the number that is commonly used in the literature of the problem. At the core of an affective speech recognition solution is a statistical model that uses speech features to recognize emotions. Such a model comes with a set of parameters that are estimated through a learning process. Proposed in this thesis is a learning algorithm, developed based on the notion of Hilbert-Schmidt independence criterion and named max-dependence regression, that maximizes the dependence between predicted and actual values of affective qualities. Pearson’s correlation coefficient is commonly used as the measure of goodness of a fit in the literature of affective computing, therefore max-dependence regression is proposed to make the learning and hypothesis testing criteria consistent with one another. Results of this research show that doing so results in higher prediction accuracy. Lastly, sparse representation for affective speech datasets is considered in this thesis. For this purpose, the application of a dictionary learning algorithm based on Hilbert-Schmidt independence criterion is proposed. Dictionary learning is used to identify the most important bases of the data in order to improve the generalization capability of the proposed solution to affective speech recognition. Based on the dictionary learning approach of choice, fusion of feature vectors is proposed. It is shown that sparse representation leads to higher generalization capability for affective speech recognition

    Beyond Flatland : exploring graphs in many dimensions

    Get PDF
    Societies, technologies, economies, ecosystems, organisms, . . . Our world is composed of complex networks—systems with many elements that interact in nontrivial ways. Graphs are natural models of these systems, and scientists have made tremendous progress in developing tools for their analysis. However, research has long focused on relatively simple graph representations and problem specifications, often discarding valuable real-world information in the process. In recent years, the limitations of this approach have become increasingly apparent, but we are just starting to comprehend how more intricate data representations and problem formulations might benefit our understanding of relational phenomena. Against this background, our thesis sets out to explore graphs in five dimensions: descriptivity, multiplicity, complexity, expressivity, and responsibility. Leveraging tools from graph theory, information theory, probability theory, geometry, and topology, we develop methods to (1) descriptively compare individual graphs, (2) characterize similarities and differences between groups of multiple graphs, (3) critically assess the complexity of relational data representations and their associated scientific culture, (4) extract expressive features from and for hypergraphs, and (5) responsibly mitigate the risks induced by graph-structured content recommendations. Thus, our thesis is naturally situated at the intersection of graph mining, graph learning, and network analysis.Gesellschaften, Technologien, Volkswirtschaften, Ökosysteme, Organismen, . . . Unsere Welt besteht aus komplexen Netzwerken—Systemen mit vielen Elementen, die auf nichttriviale Weise interagieren. Graphen sind natürliche Modelle dieser Systeme, und die Wissenschaft hat bei der Entwicklung von Methoden zu ihrer Analyse große Fortschritte gemacht. Allerdings hat sich die Forschung lange auf relativ einfache Graphrepräsentationen und Problemspezifikationen beschränkt, oft unter Vernachlässigung wertvoller Informationen aus der realen Welt. In den vergangenen Jahren sind die Grenzen dieser Herangehensweise zunehmend deutlich geworden, aber wir beginnen gerade erst zu erfassen, wie unser Verständnis relationaler Phänomene von intrikateren Datenrepräsentationen und Problemstellungen profitieren kann. Vor diesem Hintergrund erkundet unsere Dissertation Graphen in fünf Dimensionen: Deskriptivität, Multiplizität, Komplexität, Expressivität, und Verantwortung. Mithilfe von Graphentheorie, Informationstheorie, Wahrscheinlichkeitstheorie, Geometrie und Topologie entwickeln wir Methoden, welche (1) einzelne Graphen deskriptiv vergleichen, (2) Gemeinsamkeiten und Unterschiede zwischen Gruppen multipler Graphen charakterisieren, (3) die Komplexität relationaler Datenrepräsentationen und der mit ihnen verbundenen Wissenschaftskultur kritisch beleuchten, (4) expressive Merkmale von und für Hypergraphen extrahieren, und (5) verantwortungsvoll den Risiken begegnen, welche die Graphstruktur von Inhaltsempfehlungen mit sich bringt. Damit liegt unsere Dissertation naturgemäß an der Schnittstelle zwischen Graph Mining, Graph Learning und Netzwerkanalyse

    Active Perception by Interaction with Other Agents in a Predictive Coding Framework: Application to Internet of Things Environment

    Get PDF
    Predicting the state of an agent\u27s partially-observable environment is a problem of interest in many domains. Typically in the real world, the environment consists of multiple agents, not necessarily working towards a common goal. Though the goal and sensory observation for each agent is unique, one agent might have acquired some knowledge that may benefit the other. In essence, the knowledge base regarding the environment is distributed among the agents. An agent can sample this distributed knowledge base by communicating with other agents. Since an agent is not storing the entire knowledge base, its model can be small and its inference can be efficient and fault-tolerant. However, the agent needs to learn -- when, with whom and what -- to communicate (in general interact) under different situations.This dissertation presents an agent model that actively and selectively communicates with other agents to predict the state of its environment efficiently. Communication is a challenge when the internal models of other agents is unknown and unobservable. The proposed agent learns communication policies as mappings from its belief state to when, with whom and what to communicate. The policies are learned using predictive coding in an online manner, without any reinforcement. The proposed agent model is evaluated on widely-studied applications, such as human activity recognition from multimodal, multisource and heterogeneous sensor data, and transferring knowledge across sensor networks. In the applications, either each sensor or each sensor network is assumed to be monitored by an agent. The recognition accuracy on benchmark datasets is comparable to the state-of-the-art, even though our model has significantly fewer parameters and infers the state in a localized manner. The learned policy reduces number of communications. The agent is tolerant to communication failures and can recognize the reliability of each agent from its communication messages. To the best of our knowledge, this is the first work on learning communication policies by an agent for predicting the state of its environment

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning

    Get PDF
    Contains fulltext : 228326pre.pdf (preprint version ) (Open Access) Contains fulltext : 228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202
    corecore