44 research outputs found
Predicting disordered speech comprehensibility from Goodness of Pronunciation scores
International audienceSpeech production assessment in disordered speech relies on tests such as intelligibility and/or comprehensibility tests. These tests are subjective and time-consuming for both the patients and the practitioners. In this paper, we report on the use of automatically-derived pronunciation scores to predict comprehensibility ratings, on a pilot development corpus comprised of 120 utterances recorded by 12 speakers with distinct pathologies. We found high correlation values (0.81) between Goodness Of Pronunciation (GOP) scores and comprehensibility ratings. We compare the use of a baseline implementation of the GOP algorithmwith a variant called forced-GOP, which showed better results. A linear regression model allowed to predict comprehensibility scores with a 20.9% relative error, compared to the reference scores given by two expert judges. A correlation value of 0.74 was obtained between both the manual and the predicted scores. Most of the prediction errors concern the speakers who have the most extreme ratings (the lowest or the largest values), showing that the predicted score range was globally more limited than the one of the manual scores due to the simplicity of the model
Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification
This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes
Uncertainty Quantification (UQ) for automatic speech intelligibility assessment
for dysarthric speech. Current GoP methods rely heavily on neural
network-driven overconfident predictions, which is unsuitable for assessing
dysarthric speech due to its significant acoustic differences from healthy
speech. To alleviate the problem, UQ techniques were used on GoP by 1)
normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin)
and 2) modifying the scoring function (scaling, prior normalization). As a
result, prior-normalized maxlogit GoP achieves the best performance, with a
relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for
English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is
conducted to identify which phoneme scores significantly correlate with
intelligibility scores in each language.Comment: Accepted to Interspeech 202
THE RELATIONSHIP BETWEEN ACOUSTIC FEATURES OF SECOND LANGUAGE SPEECH AND LISTENER EVALUATION OF SPEECH QUALITY
Second language (L2) speech is typically less fluent than native speech, and differs from it phonetically. While the speech of some L2 English speakers seems to be easily understood by native listeners despite the presence of a foreign accent, other L2 speech seems to be more demanding, such that listeners must expend considerable effort in order to understand it. One reason for this increased difficulty may simply be the speaker’s pronunciation accuracy or phonetic intelligibility. If a L2 speaker’s pronunciations of English sounds differ sufficiently from the sounds that native listeners expect, these differences may force native listeners to work much harder to understand the divergent speech patterns. However, L2 speakers also tend to differ from native ones in terms of fluency – the degree to which a speaker is able to produce appropriately structured phrases without unnecessary pauses, self-corrections or restarts. Previous studies have shown that measures of fluency are strongly predictive of listeners’ subjective ratings of the acceptability of L2 speech: Less fluent speech is consistently considered less acceptable (Ginther, Dimova, & Yang, 2010). However, since less fluent speakers tend also to have less accurate pronunciations, it is unclear whether or how these factors might interact to influence the amount of effort listeners exert to understand L2 speech, nor is it clear how listening effort might relate to perceived quality or acceptability of speech. In this dissertation, two experiments were designed to investigate these questions
The Processing of Accented Speech
This thesis examines the processing of accented speech in both infants and adults. Accents provide a natural and reasonably consistent form of inter-speaker variation in the speech signal, but it is not yet clear exactly what processes are used to normalise this form of variation, or when and how those processes develop. Two adult studies use ERP data to examine differences between the online processing of regional- and foreign-accented speech as compared to a baseline consisting of the listeners’ home accent. These studies demonstrate that the two types of accents recruit normalisation processes which are qualitatively, and not just quantitatively, different. This provided support for the hypothesis that foreign and regional accents require different mechanisms to normalise accent-based variation (Adank et al., 2009, Floccia et al., 2009), rather than for the hypothesis that different types of accents are normalised according to their perceptual distance from the listener’s own accent (Clarke & Garrett, 2004). They also provide support for the Abstract entry approach to lexical storage of variant forms, which suggests that variant forms undergo a process of prelexical normalisation, allowing access to a canonical lexical entry (Pallier et al., 2001), rather than for the Exemplar-based approach, which suggests that variant word-forms are individually represented in the lexicon (Johnson, 1997). Two further studies examined how infants segment words from continuous speech when presented with accented speakers. The first of these includes a set of behavioural experiments, which highlight some methodological issues in the existing literature and offer some potential explanations for conflicting evidence about the age at which infants are able to segment speech. The second uses ERP data to investigate segmentation within and across accents, and provides neurophysiological evidence that 11-month-olds are able to distinguish newly-segmented words at the auditory level even within a foreign accent, or across accents, but that they are more able to treat new word-forms as word-like in a familiar accent than a foreign accent
Personalising synthetic voices for individuals with severe speech impairment.
Speech technology can help individuals with speech disorders to interact more easily. Many individuals with severe speech impairment, due to conditions such as Parkinson's disease or motor neurone disease, use voice output communication aids (VOCAs), which have synthesised or pre-recorded voice output. This voice output effectively becomes the voice of the individual and should therefore represent the user accurately.
Currently available personalisation of speech synthesis techniques require a large amount of data input, which is difficult to produce for individuals with severe speech impairment. These techniques also do not provide a solution for those individuals whose voices have begun to show the effects of dysarthria.
The thesis shows that Hidden Markov Model (HMM)-based speech synthesis is a promising approach for 'voice banking' for individuals before their condition causes deterioration of the speech and once deterioration has begun. Data input requirements for building personalised voices with this technique using human listener judgement evaluation is investigated. It shows that 100 sentences is the minimum required to build a significantly different voice from an average voice model and show some resemblance to the target speaker. This amount depends on the speaker and the average model used.
A neural network analysis trained on extracted acoustic features revealed that spectral features had the most influence for predicting human listener judgements of similarity of synthesised speech to a target speaker. Accuracy of prediction significantly improves if other acoustic features are introduced and combined non-linearly.
These results were used to inform the reconstruction of personalised synthetic voices for speakers whose voices had begun to show the effects of their conditions. Using HMM-based synthesis, personalised synthetic voices were built using dysarthric speech showing similarity to target speakers without recreating the impairment in the synthesised speech output
ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSON’S DISEASE
Previous research has identified certain overarching features of hypokinetic dysarthria
associated with Parkinson’s Disease and found it manifests differently between
individuals. Acoustic analysis has often been used to find correlates of perceptual
features for differential diagnosis. However, acoustic parameters that are robust for
differential diagnosis may not be sensitive to tracking speech changes. Previous
longitudinal studies have had limited sample sizes or variable lengths between data
collection. This study focused on using acoustic correlates of perceptual features to
identify acoustic markers able to track speech changes in people with Parkinson’s
Disease (PwPD) over six months. The thesis presents how this study has addressed
limitations of previous studies to make a novel contribution to current knowledge.
Speech data was collected from 63 PwPD and 47 control speakers using an online
podcast software at two time points, six months apart (T1 and T2). Recordings of a
standard reading passage, minimal pairs, sustained phonation, and spontaneous speech
were collected. Perceptual severity ratings were given by two speech and language
therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody
were investigated. Two analyses were conducted: a) to identify which acoustic
parameters can track perceptual speech changes over time and b) to identify which
acoustic parameters can track changes in speech intelligibility over time. An additional
attempt was made to identify if these parameters showed group differences for
differential diagnosis between PwPD and control speakers at T1 and T2.
Results showed that specific acoustic parameters in voice quality, articulation and
prosody could differentiate between PwPD and controls, or detect speech changes
between T1 and T2, but not both factors. However, specific acoustic parameters within
articulation could detect significant group and speech change differences across T1 and
T2. The thesis discusses these results, their implications, and the potential for future
studies
Proceedings of the VIIth GSCP International Conference
The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)
An inquiry into the typical and atypical language development of young transnational multilingual children in an international school
This PhD thesis investigates some of the unique characteristics of young transnational
multilingual children aged five to eleven from high-socioeconomic status families educated in an
international school in Switzerland. Its purpose is to improve understanding of typical and
atypical language development for this group. It draws on sociolinguistic research on language
variation and exposure, and clinical linguistic research on developmental language disorder
identification and cross-linguistic considerations. The specific aim of the pilot research study
presented in this thesis is to measure and discuss seven multilingual children’s verbal language
abilities in each of their languages, and to measure their combined bilingual verbal abilities and
multilingual verbal abilities. It is, therefore, influenced by discussion on language acquisition
theories that relate to complex and dynamic systems, such as the Dynamic Model of
Multilingualism. In addition, it also identifies any common characteristics, familial language
practices or experiences of the pilot group of children. A methodological design is created that
could be replicated in the future on a much larger scale as a means of confirming, extending or
disputing the findings from the pilot group. This thesis’s pilot research findings suggest that
multilingual children from high-income families who attend international schools have
significantly above average verbal language abilities when their verbal language abilities are
evaluated as one total language system (multilingual ability), a finding that is in stark contrast to
the ‘average’ results they receive when each language is evaluated on its own. The thesis
concludes that research on multilingual children that does not take into account the variables
unique to this group may fail to recognise important factors that can impact their language
development
Attention Restraint, Working Memory Capacity, and Mind Wandering: Do Emotional Valence or Intentionality Matter?
Attention restraint appears to mediate the relationship between working memory capacity (WMC) and mind wandering (Kane et al., 2016). Prior work has identifed two dimensions of mind wandering—emotional valence and intentionality. However, less is known about how WMC and attention restraint correlate with these dimensions. Te current study examined the relationship between WMC, attention restraint, and mind wandering by emotional valence and intentionality. A confrmatory factor analysis demonstrated that WMC and attention restraint were strongly correlated, but only attention restraint was related to overall mind wandering, consistent with prior fndings. However, when examining the emotional valence of mind wandering, attention restraint and WMC were related to negatively and positively valenced, but not neutral, mind wandering. Attention restraint was also related to intentional but not unintentional mind wandering. Tese results suggest that WMC and attention restraint predict some, but not all, types of mind wandering