62 research outputs found

    A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models

    Full text link
    Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP related tasks. In the end, this survey briefly discusses the commonly used ML and DL based classifiers, evaluation metrics and the applications of these word embeddings in different NLP tasks

    Alzheimer’s Dementia Recognition Through Spontaneous Speech

    Get PDF

    Language modelling for clinical natural language understanding and generation

    Get PDF
    One of the long-standing objectives of Artificial Intelligence (AI) is to design and develop algorithms for social good including tackling public health challenges. In the era of digitisation, with an unprecedented amount of healthcare data being captured in digital form, the analysis of the healthcare data at scale can lead to better research of diseases, better monitoring patient conditions and more importantly improving patient outcomes. However, many AI-based analytic algorithms rely solely on structured healthcare data such as bedside measurements and test results which only account for 20% of all healthcare data, whereas the remaining 80% of healthcare data is unstructured including textual data such as clinical notes and discharge summaries which is still underexplored. Conventional Natural Language Processing (NLP) algorithms that are designed for clinical applications rely on the shallow matching, templates and non-contextualised word embeddings which lead to limited understanding of contextual semantics. Though recent advances in NLP algorithms have demonstrated promising performance on a variety of NLP tasks in the general domain with contextualised language models, most of these generic NLP algorithms struggle at specific clinical NLP tasks which require biomedical knowledge and reasoning. Besides, there is limited research to study generative NLP algorithms to generate clinical reports and summaries automatically by considering salient clinical information. This thesis aims to design and develop novel NLP algorithms especially clinical-driven contextualised language models to understand textual healthcare data and generate clinical narratives which can potentially support clinicians, medical scientists and patients. The first contribution of this thesis focuses on capturing phenotypic information of patients from clinical notes which is important to profile patient situation and improve patient outcomes. The thesis proposes a novel self-supervised language model, named Phenotypic Intelligence Extraction (PIE), to annotate phenotypes from clinical notes with the detection of contextual synonyms and the enhancement to reason with numerical values. The second contribution is to demonstrate the utility and benefits of using phenotypic features of patients in clinical use cases by predicting patient outcomes in Intensive Care Units (ICU) and identifying patients at risk of specific diseases with better accuracy and model interpretability. The third contribution is to propose generative models to generate clinical narratives to automate and accelerate the process of report writing and summarisation by clinicians. This thesis first proposes a novel summarisation language model named PEGASUS which surpasses or is on par with the state-of-the-art performance on 12 downstream datasets including biomedical literature from PubMed. PEGASUS is further extended to generate medical scientific documents from input tabular data.Open Acces

    Automatic Detection of Dementia and related Affective Disorders through Processing of Speech and Language

    Get PDF
    In 2019, dementia is has become a trillion dollar disorder. Alzheimer’s disease (AD) is a type of dementia in which the main observable symptom is a decline in cognitive functions, notably memory, as well as language and problem-solving. Experts agree that early detection is crucial to effectively develop and apply interventions and treatments, underlining the need for effective and pervasive assessment and screening tools. The goal of this thesis is to explores how computational techniques can be used to process speech and language samples produced by patients suffering from dementia or related affective disorders, to the end of automatically detecting them in large populations us- ing machine learning models. A strong focus is laid on the detection of early stage dementia (MCI), as most clinical trials today focus on intervention at this level. To this end, novel automatic and semi-automatic analysis schemes for a speech-based cogni- tive task, i.e., verbal fluency, are explored and evaluated to be an appropriate screening task. Due to a lack of available patient data in most languages, world-first multilingual approaches to detecting dementia are introduced in this thesis. Results are encouraging and clear benefits on a small French dataset become visible. Lastly, the task of detecting these people with dementia who also suffer from an affective disorder called apathy is explored. Since they are more likely to convert into later stage of dementia faster, it is crucial to identify them. These are the fist experiments that consider this task us- ing solely speech and language as inputs. Results are again encouraging, both using only speech or language data elicited using emotional questions. Overall, strong results encourage further research in establishing speech-based biomarkers for early detection and monitoring of these disorders to better patients’ lives.Im Jahr 2019 ist Demenz zu einer Billionen-Dollar-Krankheit geworden. Die Alzheimer- Krankheit (AD) ist eine Form der Demenz, bei der das Hauptsymptom eine Abnahme der kognitiven Funktionen ist, insbesondere des Gedächtnisses sowie der Sprache und des Problemlösungsvermögens. Experten sind sich einig, dass eine frühzeitige Erkennung entscheidend für die effektive Entwicklung und Anwendung von Interventionen und Behandlungen ist, was den Bedarf an effektiven und durchgängigen Bewertungsund Screening-Tools unterstreicht. Das Ziel dieser Arbeit ist es zu erforschen, wie computergest ützte Techniken eingesetzt werden können, um Sprach- und Sprechproben von Patienten, die an Demenz oder verwandten affektiven Störungen leiden, zu verarbeiten, mit dem Ziel, diese in großen Populationen mit Hilfe von maschinellen Lernmodellen automatisch zu erkennen. Ein starker Fokus liegt auf der Erkennung von Demenz im Frühstadium (MCI), da sich die meisten klinischen Studien heute auf eine Intervention auf dieser Ebene konzentrieren. Zu diesem Zweck werden neuartige automatische und halbautomatische Analyseschemata für eine sprachbasierte kognitive Aufgabe, d.h. die verbale Geläufigkeit, erforscht und als geeignete Screening-Aufgabe bewertet. Aufgrund des Mangels an verfügbaren Patientendaten in den meisten Sprachen werden in dieser Arbeit weltweit erstmalig mehrsprachige Ansätze zur Erkennung von Demenz vorgestellt. Die Ergebnisse sind ermutigend und es werden deutliche Vorteile an einem kleinen französischen Datensatz sichtbar. Schließlich wird die Aufgabe untersucht, jene Menschen mit Demenz zu erkennen, die auch an einer affektiven Störung namens Apathie leiden. Da sie mit größerer Wahrscheinlichkeit schneller in ein späteres Stadium der Demenz übergehen, ist es entscheidend, sie zu identifizieren. Dies sind die ersten Experimente, die diese Aufgabe unter ausschließlicher Verwendung von Sprache und Sprache als Input betrachten. Die Ergebnisse sind wieder ermutigend, sowohl bei der Verwendung von reiner Sprache als auch bei der Verwendung von Sprachdaten, die durch emotionale Fragen ausgelöst werden. Insgesamt sind die Ergebnisse sehr ermutigend und ermutigen zu weiterer Forschung, um sprachbasierte Biomarker für die Früherkennung und Überwachung dieser Erkrankungen zu etablieren und so das Leben der Patienten zu verbessern

    Linguistic Threat Assessment: Understanding Targeted Violence through Computational Linguistics

    Get PDF
    Language alluding to possible violence is widespread online, and security professionals are increasingly faced with the issue of understanding and mitigating this phenomenon. The volume of extremist and violent online data presents a workload that is unmanageable for traditional, manual threat assessment. Computational linguistics may be of particular relevance to understanding threats of grievance-fuelled targeted violence on a large scale. This thesis seeks to advance knowledge on the possibilities and pitfalls of threat assessment through automated linguistic analysis. Based on in-depth interviews with expert threat assessment practitioners, three areas of language are identified which can be leveraged for automation of threat assessment, namely, linguistic content, style, and trajectories. Implementations of each area are demonstrated in three subsequent quantitative chapters. First, linguistic content is utilised to develop the Grievance Dictionary, a psycholinguistic dictionary aimed at measuring concepts related to grievance-fuelled violence in text. Thereafter, linguistic content is supplemented with measures of linguistic style in order to examine the feasibility of author profiling (determining gender, age, and personality) in abusive texts. Lastly, linguistic trajectories are measured over time in order to assess the effect of an external event on an extremist movement. Collectively, the chapters in this thesis demonstrate that linguistic automation of threat assessment is indeed possible. The concluding chapter describes the limitations of the proposed approaches and illustrates where future potential lies to improve automated linguistic threat assessment. Ideally, developers of computational implementations for threat assessment strive for explainability and transparency. Furthermore, it is argued that computational linguistics holds particular promise for large-scale measurement of grievance-fuelled language, but is perhaps less suited to prediction of actual violent behaviour. Lastly, researchers and practitioners involved in threat assessment are urged to collaboratively and critically evaluate novel computational tools which may emerge in the future

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov
    corecore