Search CORE

195 research outputs found

Analysing Changes in the Acoustic Features of the Human Voice to Detect Depression amongst Biological Females in Higher Education

Author: Cooke Joel
Publication venue
Publication date: 06/12/2024
Field of study

Depression significantly affects a large percentage of the population, with young adult females being one of the most at-risk demographics. Concurrently, there is a growing demand on healthcare, and with sufficient resources often unavailable to diagnose depression, new diagnostic methods are needed that are both cost-effective and accurate. The presence of depression is seen to significantly affect certain acoustic features of the human voice. Acoustic features have been found to exhibit subtle changes beyond the perception of the human auditory system when an individual has depression. With advances in speech processing, these subtle changes can be observed by machines. By measuring these changes, the human voice can be analysed to identify acoustic features that show a correlation with depression. The implementation of voice diagnosis would both reduce the burden on healthcare and ensure those with depression are diagnosed in a timely fashion, allowing them quicker access to treatment. The research project presents an analysis of voice data from 17 biological females between the ages of 20-26 years old in higher education as a means to detect depression. Eight participants were considered healthy with no history of depression, whilst the other nine currently had depression. Participants performed two vocal tasks consisting of extending sounds for a period of time and reading back a passage of speech. Six acoustic features were then measured from the voice data to determine whether these features can be utilised as diagnostic indicators of depression. The main finding of this study demonstrated one of the acoustic features measured demonstrates significant differences when comparing depressed and healthy individuals.<br/

Huddersfield Research Portal

Gender Bias in Depression Detection Using Audio Features

Author: Bailey Andrew
Plumbley Mark D.
Publication venue
Publication date: 03/06/2021
Field of study

Depression is a large-scale mental health problem and a challenging area for machine learning researchers in detection of depression. Datasets such as Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) have been created to aid research in this area. However, on top of the challenges inherent in accurately detecting depression, biases in datasets may result in skewed classification performance. In this paper we examine gender bias in the DAIC-WOZ dataset. We show that gender biases in DAIC-WOZ can lead to an overreporting of performance. By different concepts from Fair Machine Learning, such as data re-distribution, and using raw audio features, we can mitigate against the harmful effects of bias.Comment: 5 pages, 2 figures, to be published at EUSIPCO 202

arXiv.org e-Print Archive

University of Surrey

Envelhecimento vocal: estudo acústico-articulatório das alterações de fala com a idade

Author: Albuquerque Luciana Patrícia Martins Nunes Pereira
Publication venue
Publication date: 19/12/2022
Field of study

Background: Although the aging process causes specific alterations in the speech organs, the knowledge about the age effects in speech production is still disperse and incomplete. Objective: To provide a broader view of the age-related segmental and suprasegmental speech changes in European Portuguese (EP), considering new aspects besides static acoustic features, such as dynamic and articulatory data. Method: Two databases, with speech data of Portuguese adult native speakers obtained through standardized recording and segmentation procedures, were devised: i) an acoustic database containing all EP oral vowels produced in similar context (reading speech), and also a sample of semispontaneous speech (image description) collected from a large sample of adults between the ages 35 and 97; ii) and another with articulatory data (ultrasound (US) tongue images synchronized with speech) for all EP oral vowels produced in similar contexts (pseudowords and isolated) collected from young ([21-35]) and older ([55-73]) adults. Results: Based on the curated databases, various aspects of the aging speech were analyzed. Acoustically, the aging speech is characterized by: 1) longer vowels (in both genders); 2) a tendency for F0 to decrease in women and slightly increase in men; 3) lower vowel formant frequencies in females; 4) a significant reduction of the vowel acoustic space in men; 5) vowels with higher trajectory slope of F1 (in both genders); 6) shorter descriptions with higher pause time for males; 7) faster speech and articulation rate for females; and 8) lower HNR for females in semi-spontaneous speech. In addition, the total speech duration decrease is associated to non-severe depression symptoms and age. Older adults tended to present more depressive symptoms that could impact the amount of speech produced. Concerning the articulatory data, the tongue tends to be higher and more advanced with aging for almost all vowels, meaning that the vowel articulatory space tends to be higher, advanced, and bigger in older females. Conclusion: This study provides new information on aging speech for a language other than English. These results corroborate that speech changes with age and present different patterns between genders, and also suggest that speakers might develop specific articulatory adjustments with aging.Contextualização: Embora o processo de envelhecimento cause alterações específicas no sistema de produção de fala, o conhecimento sobre os efeitos da idade na fala é ainda disperso e incompleto. Objetivo: Proporcionar uma visão mais ampla das alterações segmentais e suprassegmentais da fala relacionadas com a idade no Português Europeu (PE), considerando outros aspetos, para além das características acústicas estáticas, tais como dados dinâmicos e articulatórios. Método: Foram criadas duas bases de dados, com dados de fala de adultos nativos do PE, obtidos através de procedimentos padronizados de gravação e segmentação: i) uma base de dados acústica contendo todas as vogais orais do PE em contexto semelhante (leitura de palavras), e também uma amostra de fala semiespontânea (descrição de imagem) produzidas por uma larga amostra de indivíduos entre os 35 e os 97 anos; ii) e outra com dados articulatórios (imagens de ultrassom da língua sincronizadas com o sinal acústico) de todas as vogais orais do PE produzidas em contextos semelhantes (pseudopalavras e palavras isoladas) por adultos de duas faixas etárias ([21-35] e [55-73]). Resultados: Tendo em conta as bases de dados curadas, foi analisado o efeito da idade em diversas características da fala. Acusticamente, a fala de pessoas mais velhas é caracterizada por: 1) vogais mais longas (ambos os sexos); 2) tendência para F0 diminuir nas mulheres e aumentar ligeiramente nos homens; 3) diminuição da frequência dos formantes das vogais nas mulheres; 4) redução significativa do espaço acústico das vogais nos homens; 5) vogais com maior inclinação da trajetória de F1 (ambos os sexos); 6) descrições mais curtas e com maior tempo de pausa nos homens; 7) aumento da velocidade articulatória e da velocidade de fala nas mulheres; e 8) diminuição do HNR na fala semiespontânea em mulheres. Além disso, os idosos tendem a apresentar mais sintomas depressivos que podem afetar a quantidade de fala produzida. Em relação aos dados articulatórios, a língua tende a apresentar-se mais alta e avançada em quase todas as vogais com a idade, ou seja o espaço articulatório das vogais tende a ser maior, mais alto e avançado nas mulheres mais velhas. Conclusão: Este estudo fornece novos dados sobre o efeito da idade na fala para uma língua diferente do inglês. Os resultados corroboram que a fala sofre alterações com a idade, que diferem em função do género, sugerindo ainda que os falantes podem desenvolver ajustes articulatórios específicos com a idade.Programa Doutoral em Gerontologia e Geriatri

Repositório Institucional da Universidade de Aveiro

VOCAL BIOMARKERS OF CLINICAL DEPRESSION: WORKING TOWARDS AN INTEGRATED MODEL OF DEPRESSION AND SPEECH

Author: Miley Wilson Erin Victoria
Publication venue: Queen Margaret University, Edinburgh
Publication date: 01/01/2021
Field of study

Speech output has long been considered a sensitive marker of a person’s mental state. It has been previously examined as a possible biomarker for diagnosis and treatment response for certain mental health conditions, including clinical depression. To date, it has been difficult to draw robust conclusions from past results due to diversity in samples, speech material, investigated parameters, and analytical methods. Within this exploratory study of speech in clinically depressed individuals, articulatory and phonatory behaviours are examined in relation to psychomotor symptom profiles and overall symptom severity. A systematic review provided context from the existing body of knowledge on the effects of depression on speech, and provided context for experimental setup within this body of work. Examinations of vowel space, monophthong, and diphthong productions as well as a multivariate acoustic analysis of other speech parameters (e.g., F0 range, perturbation measures, composite measures, etc.) are undertaken with the goal of creating a working model of the effects of depression on speech. Initial results demonstrate that overall vowel space area was not different between depressed and healthy speakers, but on closer inspection, this was due to more specific deficits seen in depressed patients along the first formant (F1) axis. Speakers with depression were more likely to produce centralised vowels along F1, as compared to F2—and this was more pronounced for low-front vowels, which are more complex given the degree of tongue-jaw coupling required for production. This pattern was seen in both monophthong and diphthong productions. Other articulatory and phonatory measures were inspected in a factor analysis as well, suggesting additional vocal biomarkers for consideration in diagnosis and treatment assessment of depression—including aperiodicity measures (e.g., higher shimmer and jitter), changes in spectral slope and tilt, and additive noise measures such as increased harmonics-to-noise ratio. Intonation was also affected by diagnostic status, but only for specific speech tasks. These results suggest that laryngeal and articulatory control is reduced by depression. Findings support the clinical utility of combining Ellgring and Scherer’s (1996) psychomotor retardation and social-emotional hypotheses to explain the effects of depression on speech, which suggest observed changes are due to a combination of cognitive, psycho-physiological and motoric mechanisms. Ultimately, depressive speech is able to be modelled along a continuum of hypo- to hyper-speech, where depressed individuals are able to assess communicative situations, assess speech requirements, and then engage in the minimum amount of motoric output necessary to convey their message. As speakers fluctuate with depressive symptoms throughout the course of their disorder, they move along the hypo-hyper-speech continuum and their speech is impacted accordingly. Recommendations for future clinical investigations of the effects of depression on speech are also presented, including suggestions for recording and reporting standards. Results contribute towards cross-disciplinary research into speech analysis between the fields of psychiatry, computer science, and speech science

Queen Margaret University eResearch

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

Directory of Open Access Books (DOAB)

MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis

Author: Cai Hanshu
Gao Guoping
Gao Yiwen
Guo Zhihua
Hu Bin
Hu Xiping
Li Jianxiu
Li Na
Li Rui
Li Xiaowei
Li Yumin
Liu Zhenyu
Ma Rong
Peng Hong
Sun Shuting
Tian Fuze
Xiao Han
Yang Jing
Yang Minqiang
Yang Zhengwu
Yao Zhijun
Zhang Lan
Zhang Xiaowei
Zhao Qinglin
Zheng Fang
Zhu Jing
Publication venue
Publication date: 04/03/2020
Field of study

According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important reason is due to the lack of physiological indicators for mental disorders. With the rising of tools such as data mining and artificial intelligence, using physiological data to explore new possible physiological indicators of mental disorder and creating new applications for mental disorder diagnosis has become a new research hot topic. However, good quality physiological data for mental disorder patients are hard to acquire. We present a multi-modal open dataset for mental-disorder analysis. The dataset includes EEG and audio data from clinically depressed patients and matching normal controls. All our patients were carefully diagnosed and selected by professional psychiatrists in hospitals. The EEG dataset includes not only data collected using traditional 128-electrodes mounted elastic cap, but also a novel wearable 3-electrode EEG collector for pervasive applications. The 128-electrodes EEG signals of 53 subjects were recorded as both in resting state and under stimulation; the 3-electrode EEG signals of 55 subjects were recorded in resting state; the audio data of 52 subjects were recorded during interviewing, reading, and picture description. We encourage other researchers in the field to use it for testing their methods of mental-disorder analysis

arXiv.org e-Print Archive

Models and analysis of vocal emissions for biomedical applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

Directory of Open Access Books (DOAB)

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

Studies on the impact of assistive communication devices on the quality of life of patients with amyotrophic lateral sclerosis

Author: Gamboa Ana Rita Mendes Londral, 1976-
Publication venue
Publication date: 01/01/2015
Field of study

Tese de doutoramento, Ciências Biomédicas (Neurociências), Universidade de Lisboa, Faculdade de Medicina, 2016Amyotrophic Lateral Sclerosis (ALS) is a progressive neuromuscular disease with rapid and generalized degeneration of motor neurons. Patients with ALS experiment a relentless decline in functions that affect performance of most activities of daily living (ADL), such as speaking, eating, walking or writing. For this reason, dependence on caregivers grows as the disease progresses. Management of the respiratory system is one of the main concerns of medical support, since respiratory failure is the most common cause of death in ALS. Due to increasing muscle weakness, most patients experience dramatic decrease of speech intelligibility and difficulties in using upper limbs (UL) for writing. There is growing evidence that mild cognitive impairment is common in ALS, but most patients are self-conscious of their difficulties in communicating and, in very severe stages, locked-in syndrome can occur. When no other resources than speech and writing are used to assist communication, patients are deprived of expressing needs or feelings, making decisions and keeping social relationships. Further, caregivers feel increased dependence due to difficulties in communication with others and get frustrated about difficulties in understanding partners’ needs. Support for communication is then very important to improve quality of life of both patients and caregivers; however, this has been poorly investigated in ALS. Assistive communication devices (ACD) can support patients by providing a diversity of tools for communication, as they progressively lose speech. ALS, in common with other degenerative conditions, introduces an additional challenge for the field of ACD: as the disease progresses, technologies must adapt to different conditions of the user. In early stages, patients may need speech synthesis in a mobile device, if dysarthria is one of the initial symptoms, or keyboard modifications, as weakness in UL increases. When upper limbs’ dysfunction is high, different input technologies may be adapted to capture voluntary control (for example, eye-tracking devices). Despite the enormous advances in the field of Assistive Technologies, in the last decade, difficulties in clinical support for the use of assistive communication devices (ACD) persist. Among the main reasons for these difficulties are lack of assessment tools to evaluate communication needs and determine proper input devices and to indicate changes over disease progression, and absence of clinical evidence that ACD has relevant impact on the quality of life of affected patients. For this set of reasons, support with communication tools is delayed to stages where patients are severely disabled. Often in these stages, patients face additional clinical complications and increased dependence on their caregivers’ decisions, which increase the difficulty in adaptation to new communication tools. This thesis addresses the role of assistive technologies in the quality of life of early-affected patients with ALS. Also, it includes the study of assessment tools that can improve longitudinal evaluation of communication needs of patients with ALS. We longitudinally evaluated a group of 30 patients with bulbar-onset ALS and 17 caregivers, during 2 to 29 months. Patients were assessed during their regular clinical appointments, in the Hospital de Santa Maria-Centro Hospitalar Lisboa_Norte. Evaluation of patients was based on validated instruments for assessing the Quality of Life (QoL) of patients and caregivers, and on methodologies for recording communication and measuring its performance (including speech, handwriting and typing). We tested the impact of early support with ACD on the QoL of patients with ALS, using a randomized, prospective, longitudinal design. Patients were able to learn and improve their skills to use communication tools based on electronic assistive devices. We found a positive impact of ACD in psychological and wellbeing domains of quality of life in patients, as well as in the support and psychological domains in caregivers. We also studied performance of communication (words per minute) using UL. Performance in handwriting may decline faster than performance in typing, supporting the idea that the use of touchscreen-based ACD supports communication for longer than handwriting. From longitudinal recordings of speech and typing activity we could observe that ACD can support tools to detect early markers of bulbar and UL dysfunction in ALS. Methodologies that were used in this research for recording and assessing function in communication can be replicated in the home environment and form part of the original contributions of this research. Implementation of remote monitoring tools in daily use of ACD, based on these methodologies, is discussed. Considering those patients who receive late support for the use of ACD, lack of time or daily support to learn how to control complex input devices may hinder its use. We developed a novel device to explore the detection and control of various residual movements, based on sensors of accelerometry, electromyography and force, as input signals for communication. The aim of this input device was to develop a tool to explore new communication channels in patients with generalized muscle weakness. This research contributed with novel tools from the Engineering field to the study of assistive communication in patients with ALS. Methodologies that were developed in this work can be further applied to the study of the impact of ACD in other neurodegenerative diseases that affect speech and motor control of UL

Universidade de Lisboa: Repositório.UL

Mood Modulates Auditory Laterality of Hemodynamic Mismatch Responses during Dichotic Listening

Hemodynamic mismatch responses can be elicited by deviant stimuli in a sequence of standard stimuli even during cognitive demanding tasks. Emotional context is known to modulate lateralized processing. Right-hemispheric negative emotion processing may bias attention to the right and enhance processing of right-ear stimuli. The present study examined the influence of induced mood on lateralized pre-attentive auditory processing of dichotic stimuli using functional magnetic resonance imaging (fMRI). Faces expressing emotions (sad/happy/neutral) were presented in a blocked design while a dichotic oddball sequence with consonant-vowel (CV) syllables in an event-related design was simultaneously administered. Twenty healthy participants were instructed to feel the emotion perceived on the images and to ignore the syllables. Deviant sounds reliably activated bilateral auditory cortices and confirmed attention effects by modulation of visual activity. Sad mood induction activated visual, limbic and right prefrontal areas. A lateralization effect of emotion-attention interaction was reflected in a stronger response to right-ear deviants in the right auditory cortex during sad mood. This imbalance of resources may be a neurophysiological correlate of laterality in sad mood and depression. Conceivably, the compensatory right-hemispheric enhancement of resources elicits increased ipsilateral processing

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der Universität Tübingen

Publikationsserver der RWTH Aachen University