658 research outputs found

    Automatic Detection of Dementia and related Affective Disorders through Processing of Speech and Language

    Get PDF
    In 2019, dementia is has become a trillion dollar disorder. Alzheimer’s disease (AD) is a type of dementia in which the main observable symptom is a decline in cognitive functions, notably memory, as well as language and problem-solving. Experts agree that early detection is crucial to effectively develop and apply interventions and treatments, underlining the need for effective and pervasive assessment and screening tools. The goal of this thesis is to explores how computational techniques can be used to process speech and language samples produced by patients suffering from dementia or related affective disorders, to the end of automatically detecting them in large populations us- ing machine learning models. A strong focus is laid on the detection of early stage dementia (MCI), as most clinical trials today focus on intervention at this level. To this end, novel automatic and semi-automatic analysis schemes for a speech-based cogni- tive task, i.e., verbal fluency, are explored and evaluated to be an appropriate screening task. Due to a lack of available patient data in most languages, world-first multilingual approaches to detecting dementia are introduced in this thesis. Results are encouraging and clear benefits on a small French dataset become visible. Lastly, the task of detecting these people with dementia who also suffer from an affective disorder called apathy is explored. Since they are more likely to convert into later stage of dementia faster, it is crucial to identify them. These are the fist experiments that consider this task us- ing solely speech and language as inputs. Results are again encouraging, both using only speech or language data elicited using emotional questions. Overall, strong results encourage further research in establishing speech-based biomarkers for early detection and monitoring of these disorders to better patients’ lives.Im Jahr 2019 ist Demenz zu einer Billionen-Dollar-Krankheit geworden. Die Alzheimer- Krankheit (AD) ist eine Form der Demenz, bei der das Hauptsymptom eine Abnahme der kognitiven Funktionen ist, insbesondere des Gedächtnisses sowie der Sprache und des Problemlösungsvermögens. Experten sind sich einig, dass eine frühzeitige Erkennung entscheidend für die effektive Entwicklung und Anwendung von Interventionen und Behandlungen ist, was den Bedarf an effektiven und durchgängigen Bewertungsund Screening-Tools unterstreicht. Das Ziel dieser Arbeit ist es zu erforschen, wie computergest ützte Techniken eingesetzt werden können, um Sprach- und Sprechproben von Patienten, die an Demenz oder verwandten affektiven Störungen leiden, zu verarbeiten, mit dem Ziel, diese in großen Populationen mit Hilfe von maschinellen Lernmodellen automatisch zu erkennen. Ein starker Fokus liegt auf der Erkennung von Demenz im Frühstadium (MCI), da sich die meisten klinischen Studien heute auf eine Intervention auf dieser Ebene konzentrieren. Zu diesem Zweck werden neuartige automatische und halbautomatische Analyseschemata für eine sprachbasierte kognitive Aufgabe, d.h. die verbale Geläufigkeit, erforscht und als geeignete Screening-Aufgabe bewertet. Aufgrund des Mangels an verfügbaren Patientendaten in den meisten Sprachen werden in dieser Arbeit weltweit erstmalig mehrsprachige Ansätze zur Erkennung von Demenz vorgestellt. Die Ergebnisse sind ermutigend und es werden deutliche Vorteile an einem kleinen französischen Datensatz sichtbar. Schließlich wird die Aufgabe untersucht, jene Menschen mit Demenz zu erkennen, die auch an einer affektiven Störung namens Apathie leiden. Da sie mit größerer Wahrscheinlichkeit schneller in ein späteres Stadium der Demenz übergehen, ist es entscheidend, sie zu identifizieren. Dies sind die ersten Experimente, die diese Aufgabe unter ausschließlicher Verwendung von Sprache und Sprache als Input betrachten. Die Ergebnisse sind wieder ermutigend, sowohl bei der Verwendung von reiner Sprache als auch bei der Verwendung von Sprachdaten, die durch emotionale Fragen ausgelöst werden. Insgesamt sind die Ergebnisse sehr ermutigend und ermutigen zu weiterer Forschung, um sprachbasierte Biomarker für die Früherkennung und Überwachung dieser Erkrankungen zu etablieren und so das Leben der Patienten zu verbessern

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Identification of voice pathologies in an elderly population

    Get PDF
    Ageing is associated with an increased risk of developing diseases, including a greater pre- disposition to develop diseases such as Sepsis. Also, with ageing, human voices undergo a natural degradation gauged by alterations in hoarseness, breathiness, articulatory ability, and speaking rate. Nowadays, perceptual evaluation is widely used to assess speech and voice impairments despite its high subjectivity. This dissertation proposes a new method for detecting and identifying voice patholo- gies by exploring acoustic parameters of continuous speech signals in the elderly popula- tion. Additionally, a study of the influence of gender and age on voice pathology detection systems’ performance is conducted. The study included 44 subjects older than 60 years old, with the pathologies Dyspho- nia, Functional Dysphonia, and Spasmodic Dysphonia. In the dataset originated with these settings, two gender-dependent subsets were created, one with only female samples and the other with only male samples. The system developed used three feature selection methods and five Machine Learning algorithms to classify the voice signal according to the presence of pathology. The binary classification, which consisted of voice pathology detection, reached an accuracy of 85,1%±5,1% for the dataset without gender division, 83,7%±7,0% for the male dataset, and 87,4%±4,2% for the female dataset. As for the multiclass classifica- tion, which consisted of the classification of different pathologies, reached an accuracy of 69,0%±5,1% for the dataset without gender division, 63,7%± 5,4% for the male dataset, and 80,6%±8,1% for the female dataset. The obtained results revealed that features that describe fluency are important and discriminating in these types of systems. Also, Random Forest has shown to be the most effective Machine Learning algorithm for both binary and multiclass classification. The proposed model proves to be promising in detecting pathological voices and identifying the underlying pathology in an elderly population, with an increase in its performance when a gender division is performed.O envelhecimento está associado a um maior risco de desenvolvimento de doenças, nome- adamente a uma maior predisposição para a evolução de doenças como a Sepsis. Inclusiva- mente, com o envelhecimento, a voz sofre uma degradação natural aferindo-se alterações na rouquidão, respiração, capacidade articulatória e no ritmo do discurso. Atualmente, a avaliação percetual é amplamente utilizada para avaliar as perturbações da fala e da voz, possuindo elevada subjetividade. Esta dissertação propõe um novo método de deteção e identificação de patologias da voz através da exploração de parâmetros acústicos de sinais de fala contínua na população idosa. Adicionalmente, é realizado um estudo da influência do género e da idade no desempenho dos sistemas de detecção de patologias da voz. A amostra deste estudo é composta por 44 indivíduos com idades superiores a 60 anos referentes às patologias Disfonia, Disfonia Funcional e Disfonia Espasmódica. No conjunto de dados originados com esta configuração, foram criados dois subconjuntos de- pendentes do género: um com apenas amostras femininas e o outro com apenas amostras masculinas. O sistema desenvolvido utilizou três métodos de seleção de atributos e cinco algoritmos de Aprendizagem Automática de modo a classificar o sinal de voz de acordo com a presença de patologias da voz. A deteção de patologia de voz alcançou uma exatidão de 85,1%±5,1% para os da- dos sem divisão de género, 83,7%±7,0% para os dados masculinos, e 87,4%±4,2% para os dados femininos. A classificação de diferentes patologias alcançou uma exatidão de 69,0%±5,1% para os dados sem divisão de género, 63,7%±5,4% para os dados masculinos, e 80,6%±8,1% para os dados femininos. Os resultados obtidos revelaram que os atributos que caracterizam a fluência são importantes e discriminatórios nestes tipos de sistemas. Ademais, o classificador Random Forest demonstrou ser o algoritmo mais eficaz na deteção e identificação de patologias da voz. O modelo proposto revelou-se promissor na deteção de vozes patológicas e identifi- cação da patologia subjacente numa população idosa, aumentando o seu desempenho quando ocorre uma divisão de género

    Speech- and Language-Based Classification of Alzheimer’s Disease: A Systematic Review

    Get PDF
    Background: Alzheimer’s disease (AD) has paramount importance due to its rising prevalence, the impact on the patient and society, and the related healthcare costs. However, current diagnostic techniques are not designed for frequent mass screening, delaying therapeutic intervention and worsening prognoses. To be able to detect AD at an early stage, ideally at a pre-clinical stage, speech analysis emerges as a simple low-cost non-invasive procedure. Objectives: In this work it is our objective to do a systematic review about speech-based detection and classification of Alzheimer’s Disease with the purpose of identifying the most effective algorithms and best practices. Methods: A systematic literature search was performed from Jan 2015 up to May 2020 using ScienceDirect, PubMed and DBLP. Articles were screened by title, abstract and full text as needed. A manual complementary search among the references of the included papers was also performed. Inclusion criteria and search strategies were defined a priori. Results: We were able: to identify the main resources that can support the development of decision support systems for AD, to list speech features that are correlated with the linguistic and acoustic footprint of the disease, to recognize the data models that can provide robust results and to observe the performance indicators that were reported. Discussion: A computational system with the adequate elements combination, based on the identified best-practices, can point to a whole new diagnostic approach, leading to better insights about AD symptoms and its disease patterns, creating conditions to promote a longer life span as well as an improvement in patient quality of life. The clinically relevant results that were identified can be used to establish a reference system and help to define research guidelines for future developments.This work was partially supported by FCT- UIDB/04730/2020 project.info:eu-repo/semantics/publishedVersio

    JDReAM. Journal of InterDisciplinary Research Applied to Medicine - Vol. 4, issue 2 (2020)

    Get PDF

    JDReAM. Journal of InterDisciplinary Research Applied to Medicine - Vol. 4, issue 2 (2020)

    Get PDF

    CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES

    Full text link
    Tesis por compendio[ES] Durante los últimos años, los repositorios multimedia en línea se han convertido en fuentes clave de conocimiento gracias al auge de Internet, especialmente en el área de la educación. Instituciones educativas de todo el mundo han dedicado muchos recursos en la búsqueda de nuevos métodos de enseñanza, tanto para mejorar la asimilación de nuevos conocimientos, como para poder llegar a una audiencia más amplia. Como resultado, hoy en día disponemos de diferentes repositorios con clases grabadas que siven como herramientas complementarias en la enseñanza, o incluso pueden asentar una nueva base en la enseñanza a distancia. Sin embargo, deben cumplir con una serie de requisitos para que la experiencia sea totalmente satisfactoria y es aquí donde la transcripción de los materiales juega un papel fundamental. La transcripción posibilita una búsqueda precisa de los materiales en los que el alumno está interesado, se abre la puerta a la traducción automática, a funciones de recomendación, a la generación de resumenes de las charlas y además, el poder hacer llegar el contenido a personas con discapacidades auditivas. No obstante, la generación de estas transcripciones puede resultar muy costosa. Con todo esto en mente, la presente tesis tiene como objetivo proporcionar nuevas herramientas y técnicas que faciliten la transcripción de estos repositorios. En particular, abordamos el desarrollo de un conjunto de herramientas de reconocimiento de automático del habla, con énfasis en las técnicas de aprendizaje profundo que contribuyen a proporcionar transcripciones precisas en casos de estudio reales. Además, se presentan diferentes participaciones en competiciones internacionales donde se demuestra la competitividad del software comparada con otras soluciones. Por otra parte, en aras de mejorar los sistemas de reconocimiento, se propone una nueva técnica de adaptación de estos sistemas al interlocutor basada en el uso Medidas de Confianza. Esto además motivó el desarrollo de técnicas para la mejora en la estimación de este tipo de medidas por medio de Redes Neuronales Recurrentes. Todas las contribuciones presentadas se han probado en diferentes repositorios educativos. De hecho, el toolkit transLectures-UPV es parte de un conjunto de herramientas que sirve para generar transcripciones de clases en diferentes universidades e instituciones españolas y europeas.[CA] Durant els últims anys, els repositoris multimèdia en línia s'han convertit en fonts clau de coneixement gràcies a l'expansió d'Internet, especialment en l'àrea de l'educació. Institucions educatives de tot el món han dedicat molts recursos en la recerca de nous mètodes d'ensenyament, tant per millorar l'assimilació de nous coneixements, com per poder arribar a una audiència més àmplia. Com a resultat, avui dia disposem de diferents repositoris amb classes gravades que serveixen com a eines complementàries en l'ensenyament, o fins i tot poden assentar una nova base a l'ensenyament a distància. No obstant això, han de complir amb una sèrie de requisits perquè la experiència siga totalment satisfactòria i és ací on la transcripció dels materials juga un paper fonamental. La transcripció possibilita una recerca precisa dels materials en els quals l'alumne està interessat, s'obri la porta a la traducció automàtica, a funcions de recomanació, a la generació de resums de les xerrades i el poder fer arribar el contingut a persones amb discapacitats auditives. No obstant, la generació d'aquestes transcripcions pot resultar molt costosa. Amb això en ment, la present tesi té com a objectiu proporcionar noves eines i tècniques que faciliten la transcripció d'aquests repositoris. En particular, abordem el desenvolupament d'un conjunt d'eines de reconeixement automàtic de la parla, amb èmfasi en les tècniques d'aprenentatge profund que contribueixen a proporcionar transcripcions precises en casos d'estudi reals. A més, es presenten diferents participacions en competicions internacionals on es demostra la competitivitat del programari comparada amb altres solucions. D'altra banda, per tal de millorar els sistemes de reconeixement, es proposa una nova tècnica d'adaptació d'aquests sistemes a l'interlocutor basada en l'ús de Mesures de Confiança. A més, això va motivar el desenvolupament de tècniques per a la millora en l'estimació d'aquest tipus de mesures per mitjà de Xarxes Neuronals Recurrents. Totes les contribucions presentades s'han provat en diferents repositoris educatius. De fet, el toolkit transLectures-UPV és part d'un conjunt d'eines que serveix per generar transcripcions de classes en diferents universitats i institucions espanyoles i europees.[EN] During the last years, on-line multimedia repositories have become key knowledge assets thanks to the rise of Internet and especially in the area of education. Educational institutions around the world have devoted big efforts to explore different teaching methods, to improve the transmission of knowledge and to reach a wider audience. As a result, online video lecture repositories are now available and serve as complementary tools that can boost the learning experience to better assimilate new concepts. In order to guarantee the success of these repositories the transcription of each lecture plays a very important role because it constitutes the first step towards the availability of many other features. This transcription allows the searchability of learning materials, enables the translation into another languages, provides recommendation functions, gives the possibility to provide content summaries, guarantees the access to people with hearing disabilities, etc. However, the transcription of these videos is expensive in terms of time and human cost. To this purpose, this thesis aims at providing new tools and techniques that ease the transcription of these repositories. In particular, we address the development of a complete Automatic Speech Recognition Toolkit with an special focus on the Deep Learning techniques that contribute to provide accurate transcriptions in real-world scenarios. This toolkit is tested against many other in different international competitions showing comparable transcription quality. Moreover, a new technique to improve the recognition accuracy has been proposed which makes use of Confidence Measures, and constitutes the spark that motivated the proposal of new Confidence Measures techniques that helped to further improve the transcription quality. To this end, a new speaker-adapted confidence measure approach was proposed for models based on Recurrent Neural Networks. The contributions proposed herein have been tested in real-life scenarios in different educational repositories. In fact, the transLectures-UPV toolkit is part of a set of tools for providing video lecture transcriptions in many different Spanish and European universities and institutions.Agua Teba, MÁD. (2019). CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130198TESISCompendi

    Automated screening methods for mental and neuro-developmental disorders

    Get PDF
    Mental and neuro-developmental disorders such as depression, bipolar disorder, and autism spectrum disorder (ASD) are critical healthcare issues which affect a large number of people. Depression, according to the World Health Organisation, is the largest cause of disability worldwide and affects more than 300 million people. Bipolar disorder affects more than 60 million individuals worldwide. ASD, meanwhile, affects more than 1 in 100 people in the UK. Not only do these disorders adversely affect the quality of life of affected individuals, they also have a significant economic impact. While brute-force approaches are potentially useful for learning new features which could be representative of these disorders, such approaches may not be best suited for developing robust screening methods. This is due to a myriad of confounding factors, such as the age, gender, cultural background, and socio-economic status, which can affect social signals of individuals in a similar way as the symptoms of these disorders. Brute-force approaches may learn to exploit effects of these confounding factors on social signals in place of effects due to mental and neuro-developmental disorders. The main objective of this thesis is to develop, investigate, and propose computational methods to screen for mental and neuro-developmental disorders in accordance with descriptions given in the Diagnostic and Statistical Manual (DSM). The DSM manual is a guidebook published by the American Psychiatric Association which offers common language on mental disorders. Our motivation is to alleviate, to an extent, the possibility of machine learning algorithms picking up one of the confounding factors to optimise performance for the dataset – something which we do not find uncommon in research literature. To this end, we introduce three new methods for automated screening for depression from audio/visual recordings, namely: turbulence features, craniofacial movement features, and Fisher Vector based representation of speech spectra. We surmise that psychomotor changes due to depression lead to uniqueness in an individual's speech pattern which manifest as sudden and erratic changes in speech feature contours. The efficacy of these features is demonstrated as part of our solution to Audio/Visual Emotion Challenge 2017 (AVEC 2017) on Depression severity prediction. We also detail a methodology to quantify specific craniofacial movements, which we hypothesised could be indicative of psychomotor retardation, and hence depression. The efficacy of craniofacial movement features is demonstrated using datasets from the 2014 and 2017 editions of AVEC Depression severity prediction challenges. Finally, using the dataset provided as part of AVEC 2016 Depression classification challenge, we demonstrate that differences between speech of individuals with and without depression can be quantified effectively using the Fisher Vector representation of speech spectra. For our work on automated screening of bipolar disorder, we propose methods to classify individuals with bipolar disorder into states of remission, hypo-mania, and mania. Here, we surmise that like depression, individuals with different levels of mania have certain uniqueness to their social signals. Based on this understanding, we propose the use of turbulence features for audio/visual social signals (i.e. speech and facial expressions). We also propose the use of Fisher Vectors to create a unified representation of speech in terms of prosody, voice quality, and speech spectra. These methods have been proposed as part of our solution to the AVEC 2018 Bipolar disorder challenge. In addition, we find that the task of automated screening for ASD is much more complicated. Here, confounding factors can easily overwhelm socials signals which are affected by ASD. We discuss, in the light of research literature and our experimental analysis, that significant collaborative work is required between computer scientists and clinicians to discern social signals which are robust to common confounding factors
    corecore