195 research outputs found
Analysing Changes in the Acoustic Features of the Human Voice to Detect Depression amongst Biological Females in Higher Education
Depression significantly affects a large percentage of the population, with young adult females being one of the most at-risk demographics. Concurrently, there is a growing demand on healthcare, and with sufficient resources often unavailable to diagnose depression, new diagnostic methods are needed that are both cost-effective and accurate. The presence of depression is seen to significantly affect certain acoustic features of the human voice. Acoustic features have been found to exhibit subtle changes beyond the perception of the human auditory system when an individual has depression. With advances in speech processing, these subtle changes can be observed by machines. By measuring these changes, the human voice can be analysed to identify acoustic features that show a correlation with depression. The implementation of voice diagnosis would both reduce the burden on healthcare and ensure those with depression are diagnosed in a timely fashion, allowing them quicker access to treatment. The research project presents an analysis of voice data from 17 biological females between the ages of 20-26 years old in higher education as a means to detect depression. Eight participants were considered healthy with no history of depression, whilst the other nine currently had depression. Participants performed two vocal tasks consisting of extending sounds for a period of time and reading back a passage of speech. Six acoustic features were then measured from the voice data to determine whether these features can be utilised as diagnostic indicators of depression. The main finding of this study demonstrated one of the acoustic features measured demonstrates significant differences when comparing depressed and healthy individuals.<br/
Gender Bias in Depression Detection Using Audio Features
Depression is a large-scale mental health problem and a challenging area for
machine learning researchers in detection of depression. Datasets such as
Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) have been created
to aid research in this area. However, on top of the challenges inherent in
accurately detecting depression, biases in datasets may result in skewed
classification performance. In this paper we examine gender bias in the
DAIC-WOZ dataset. We show that gender biases in DAIC-WOZ can lead to an
overreporting of performance. By different concepts from Fair Machine Learning,
such as data re-distribution, and using raw audio features, we can mitigate
against the harmful effects of bias.Comment: 5 pages, 2 figures, to be published at EUSIPCO 202
Envelhecimento vocal: estudo acústico-articulatório das alterações de fala com a idade
Background: Although the aging process causes specific alterations in the
speech organs, the knowledge about the age effects in speech production is still
disperse and incomplete. Objective: To provide a broader view of the age-related
segmental and suprasegmental speech changes in European Portuguese (EP),
considering new aspects besides static acoustic features, such as dynamic and
articulatory data. Method: Two databases, with speech data of Portuguese
adult native speakers obtained through standardized recording and segmentation
procedures, were devised: i) an acoustic database containing all EP oral
vowels produced in similar context (reading speech), and also a sample of semispontaneous
speech (image description) collected from a large sample of adults
between the ages 35 and 97; ii) and another with articulatory data (ultrasound
(US) tongue images synchronized with speech) for all EP oral vowels produced in
similar contexts (pseudowords and isolated) collected from young ([21-35]) and
older ([55-73]) adults. Results: Based on the curated databases, various aspects
of the aging speech were analyzed. Acoustically, the aging speech is characterized
by: 1) longer vowels (in both genders); 2) a tendency for F0 to decrease
in women and slightly increase in men; 3) lower vowel formant frequencies in
females; 4) a significant reduction of the vowel acoustic space in men; 5) vowels
with higher trajectory slope of F1 (in both genders); 6) shorter descriptions with
higher pause time for males; 7) faster speech and articulation rate for females;
and 8) lower HNR for females in semi-spontaneous speech. In addition, the total
speech duration decrease is associated to non-severe depression symptoms and
age. Older adults tended to present more depressive symptoms that could impact
the amount of speech produced. Concerning the articulatory data, the tongue
tends to be higher and more advanced with aging for almost all vowels, meaning
that the vowel articulatory space tends to be higher, advanced, and bigger in older
females. Conclusion: This study provides new information on aging speech for
a language other than English. These results corroborate that speech changes
with age and present different patterns between genders, and also suggest that
speakers might develop specific articulatory adjustments with aging.Contextualização: Embora o processo de envelhecimento cause alterações
especÃficas no sistema de produção de fala, o conhecimento sobre os efeitos da
idade na fala é ainda disperso e incompleto. Objetivo: Proporcionar uma visão
mais ampla das alterações segmentais e suprassegmentais da fala relacionadas
com a idade no Português Europeu (PE), considerando outros aspetos, para além
das caracterÃsticas acústicas estáticas, tais como dados dinâmicos e articulatórios.
Método: Foram criadas duas bases de dados, com dados de fala de adultos
nativos do PE, obtidos através de procedimentos padronizados de gravação e
segmentação: i) uma base de dados acústica contendo todas as vogais orais do
PE em contexto semelhante (leitura de palavras), e também uma amostra de fala
semiespontânea (descrição de imagem) produzidas por uma larga amostra de
indivÃduos entre os 35 e os 97 anos; ii) e outra com dados articulatórios (imagens
de ultrassom da lÃngua sincronizadas com o sinal acústico) de todas as vogais
orais do PE produzidas em contextos semelhantes (pseudopalavras e palavras
isoladas) por adultos de duas faixas etárias ([21-35] e [55-73]). Resultados:
Tendo em conta as bases de dados curadas, foi analisado o efeito da idade em
diversas caracterÃsticas da fala. Acusticamente, a fala de pessoas mais velhas é
caracterizada por: 1) vogais mais longas (ambos os sexos); 2) tendência para
F0 diminuir nas mulheres e aumentar ligeiramente nos homens; 3) diminuição
da frequência dos formantes das vogais nas mulheres; 4) redução significativa
do espaço acústico das vogais nos homens; 5) vogais com maior inclinação da
trajetória de F1 (ambos os sexos); 6) descrições mais curtas e com maior tempo
de pausa nos homens; 7) aumento da velocidade articulatória e da velocidade de
fala nas mulheres; e 8) diminuição do HNR na fala semiespontânea em mulheres.
Além disso, os idosos tendem a apresentar mais sintomas depressivos que podem
afetar a quantidade de fala produzida. Em relação aos dados articulatórios, a
lÃngua tende a apresentar-se mais alta e avançada em quase todas as vogais com
a idade, ou seja o espaço articulatório das vogais tende a ser maior, mais alto
e avançado nas mulheres mais velhas. Conclusão: Este estudo fornece novos
dados sobre o efeito da idade na fala para uma lÃngua diferente do inglês. Os
resultados corroboram que a fala sofre alterações com a idade, que diferem em
função do género, sugerindo ainda que os falantes podem desenvolver ajustes
articulatórios especÃficos com a idade.Programa Doutoral em Gerontologia e Geriatri
VOCAL BIOMARKERS OF CLINICAL DEPRESSION: WORKING TOWARDS AN INTEGRATED MODEL OF DEPRESSION AND SPEECH
Speech output has long been considered a sensitive marker of a person’s mental state. It has been previously examined as a possible biomarker for diagnosis and treatment response for certain mental health conditions, including clinical depression. To date, it has been difficult to draw robust conclusions from past results due to diversity in samples, speech material, investigated parameters, and analytical methods.
Within this exploratory study of speech in clinically depressed individuals, articulatory and phonatory behaviours are examined in relation to psychomotor symptom profiles and overall symptom severity. A systematic review provided context from the existing body of knowledge on the effects of depression on speech, and provided context for experimental setup within this body of work. Examinations of vowel space, monophthong, and diphthong productions as well as a multivariate acoustic analysis of other speech parameters (e.g., F0 range, perturbation measures, composite measures, etc.) are undertaken with the goal of creating a working model of the effects of depression on speech. Initial results demonstrate that overall vowel space area was not different between depressed and healthy speakers, but on closer inspection, this was due to more specific deficits seen in depressed patients along the first formant (F1) axis. Speakers with depression were more likely to produce centralised vowels along F1, as compared to F2—and this was more pronounced for low-front vowels, which are more complex given the degree of tongue-jaw coupling required for production. This pattern was seen in both monophthong and diphthong productions. Other articulatory and phonatory measures were inspected in a factor analysis as well, suggesting additional vocal biomarkers for consideration in diagnosis and treatment assessment of depression—including aperiodicity measures (e.g., higher shimmer and jitter), changes in spectral slope and tilt, and additive noise measures such as increased harmonics-to-noise ratio. Intonation was also affected by diagnostic status, but only for specific speech tasks. These results suggest that laryngeal and articulatory control is reduced by depression.
Findings support the clinical utility of combining Ellgring and Scherer’s (1996) psychomotor retardation and social-emotional hypotheses to explain the effects of depression on speech, which suggest observed changes are due to a combination of cognitive, psycho-physiological and motoric mechanisms. Ultimately, depressive speech is able to be modelled along a continuum of hypo- to hyper-speech, where depressed individuals are able to assess communicative situations, assess speech requirements, and then engage in the minimum amount of motoric output necessary to convey their message. As speakers fluctuate with depressive symptoms throughout the course of their disorder, they move along the hypo-hyper-speech continuum and their speech is impacted accordingly.
Recommendations for future clinical investigations of the effects of depression on speech are also presented, including suggestions for recording and reporting standards. Results contribute towards cross-disciplinary research into speech analysis between the fields of psychiatry, computer science, and speech science
Models and Analysis of Vocal Emissions for Biomedical Applications
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis
According to the World Health Organization, the number of mental disorder
patients, especially depression patients, has grown rapidly and become a
leading contributor to the global burden of disease. However, the present
common practice of depression diagnosis is based on interviews and clinical
scales carried out by doctors, which is not only labor-consuming but also
time-consuming. One important reason is due to the lack of physiological
indicators for mental disorders. With the rising of tools such as data mining
and artificial intelligence, using physiological data to explore new possible
physiological indicators of mental disorder and creating new applications for
mental disorder diagnosis has become a new research hot topic. However, good
quality physiological data for mental disorder patients are hard to acquire. We
present a multi-modal open dataset for mental-disorder analysis. The dataset
includes EEG and audio data from clinically depressed patients and matching
normal controls. All our patients were carefully diagnosed and selected by
professional psychiatrists in hospitals. The EEG dataset includes not only data
collected using traditional 128-electrodes mounted elastic cap, but also a
novel wearable 3-electrode EEG collector for pervasive applications. The
128-electrodes EEG signals of 53 subjects were recorded as both in resting
state and under stimulation; the 3-electrode EEG signals of 55 subjects were
recorded in resting state; the audio data of 52 subjects were recorded during
interviewing, reading, and picture description. We encourage other researchers
in the field to use it for testing their methods of mental-disorder analysis
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Studies on the impact of assistive communication devices on the quality of life of patients with amyotrophic lateral sclerosis
Tese de doutoramento, Ciências Biomédicas (Neurociências), Universidade de Lisboa, Faculdade de Medicina, 2016Amyotrophic Lateral Sclerosis (ALS) is a progressive neuromuscular disease with rapid and generalized degeneration of motor neurons. Patients with ALS experiment a relentless decline in functions that affect performance of most activities of daily living (ADL), such as speaking, eating, walking or writing. For this reason, dependence on caregivers grows as the disease progresses. Management of the respiratory system is one of the main concerns of medical support, since respiratory failure is the most common cause of death in ALS. Due to increasing muscle weakness, most patients experience dramatic decrease of speech intelligibility and difficulties in using upper limbs (UL) for writing. There is growing evidence that mild cognitive impairment is common in ALS, but most patients are self-conscious of their difficulties in communicating and, in very severe stages, locked-in syndrome can occur. When no other resources than speech and writing are used to assist communication, patients are deprived of expressing needs or feelings, making decisions and keeping social relationships. Further, caregivers feel increased dependence due to difficulties in communication with others and get frustrated about difficulties in understanding partners’ needs. Support for communication is then very important to improve quality of life of both patients and caregivers; however, this has been poorly investigated in ALS. Assistive communication devices (ACD) can support patients by providing a diversity of tools for communication, as they progressively lose speech. ALS, in common with other degenerative conditions, introduces an additional challenge for the field of ACD: as the disease progresses, technologies must adapt to different conditions of the user. In early stages, patients may need speech synthesis in a mobile device, if dysarthria is one of the initial symptoms, or keyboard modifications, as weakness in UL increases. When upper limbs’ dysfunction is high, different input technologies may be adapted to capture voluntary control (for example, eye-tracking devices). Despite the enormous advances in the field of Assistive Technologies, in the last decade, difficulties in clinical support for the use of assistive communication devices (ACD) persist. Among the main reasons for these difficulties are lack of assessment tools to evaluate communication needs and determine proper input devices and to indicate changes over disease progression, and absence of clinical evidence that ACD has relevant impact on the quality of life of affected patients. For this set of reasons, support with communication tools is delayed to stages where patients are severely disabled. Often in these stages, patients face additional clinical complications and increased dependence on their caregivers’ decisions, which increase the difficulty in adaptation to new communication tools. This thesis addresses the role of assistive technologies in the quality of life of early-affected patients with ALS. Also, it includes the study of assessment tools that can improve longitudinal evaluation of communication needs of patients with ALS. We longitudinally evaluated a group of 30 patients with bulbar-onset ALS and 17 caregivers, during 2 to 29 months. Patients were assessed during their regular clinical appointments, in the Hospital de Santa Maria-Centro Hospitalar Lisboa_Norte. Evaluation of patients was based on validated instruments for assessing the Quality of Life (QoL) of patients and caregivers, and on methodologies for recording communication and measuring its performance (including speech, handwriting and typing). We tested the impact of early support with ACD on the QoL of patients with ALS, using a randomized, prospective, longitudinal design. Patients were able to learn and improve their skills to use communication tools based on electronic assistive devices. We found a positive impact of ACD in psychological and wellbeing domains of quality of life in patients, as well as in the support and psychological domains in caregivers. We also studied performance of communication (words per minute) using UL. Performance in handwriting may decline faster than performance in typing, supporting the idea that the use of touchscreen-based ACD supports communication for longer than handwriting. From longitudinal recordings of speech and typing activity we could observe that ACD can support tools to detect early markers of bulbar and UL dysfunction in ALS. Methodologies that were used in this research for recording and assessing function in communication can be replicated in the home environment and form part of the original contributions of this research. Implementation of remote monitoring tools in daily use of ACD, based on these methodologies, is discussed. Considering those patients who receive late support for the use of ACD, lack of time or daily support to learn how to control complex input devices may hinder its use. We developed a novel device to explore the detection and control of various residual movements, based on sensors of accelerometry, electromyography and force, as input signals for communication. The aim of this input device was to develop a tool to explore new communication channels in patients with generalized muscle weakness. This research contributed with novel tools from the Engineering field to the study of assistive communication in patients with ALS. Methodologies that were developed in this work can be further applied to the study of the impact of ACD in other neurodegenerative diseases that affect speech and motor control of UL
Mood Modulates Auditory Laterality of Hemodynamic Mismatch Responses during Dichotic Listening
Hemodynamic mismatch responses can be elicited by deviant stimuli in a sequence of standard stimuli even during cognitive demanding tasks. Emotional context is known to modulate lateralized processing. Right-hemispheric negative emotion processing may bias attention to the right and enhance processing of right-ear stimuli. The present study examined the influence of induced mood on lateralized pre-attentive auditory processing of dichotic stimuli using functional magnetic resonance imaging (fMRI). Faces expressing emotions (sad/happy/neutral) were presented in a blocked design while a dichotic oddball sequence with consonant-vowel (CV) syllables in an event-related design was simultaneously administered. Twenty healthy participants were instructed to feel the emotion perceived on the images and to ignore the syllables. Deviant sounds reliably activated bilateral auditory cortices and confirmed attention effects by modulation of visual activity. Sad mood induction activated visual, limbic and right prefrontal areas. A lateralization effect of emotion-attention interaction was reflected in a stronger response to right-ear deviants in the right auditory cortex during sad mood. This imbalance of resources may be a neurophysiological correlate of laterality in sad mood and depression. Conceivably, the compensatory right-hemispheric enhancement of resources elicits increased ipsilateral processing
- …