44 research outputs found

    Predicting disordered speech comprehensibility from Goodness of Pronunciation scores

    Get PDF
    International audienceSpeech production assessment in disordered speech relies on tests such as intelligibility and/or comprehensibility tests. These tests are subjective and time-consuming for both the patients and the practitioners. In this paper, we report on the use of automatically-derived pronunciation scores to predict comprehensibility ratings, on a pilot development corpus comprised of 120 utterances recorded by 12 speakers with distinct pathologies. We found high correlation values (0.81) between Goodness Of Pronunciation (GOP) scores and comprehensibility ratings. We compare the use of a baseline implementation of the GOP algorithmwith a variant called forced-GOP, which showed better results. A linear regression model allowed to predict comprehensibility scores with a 20.9% relative error, compared to the reference scores given by two expert judges. A correlation value of 0.74 was obtained between both the manual and the predicted scores. Most of the prediction errors concern the speakers who have the most extreme ratings (the lowest or the largest values), showing that the predicted score range was globally more limited than the one of the manual scores due to the simplicity of the model

    Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification

    Full text link
    This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.Comment: Accepted to Interspeech 202

    THE RELATIONSHIP BETWEEN ACOUSTIC FEATURES OF SECOND LANGUAGE SPEECH AND LISTENER EVALUATION OF SPEECH QUALITY

    Get PDF
    Second language (L2) speech is typically less fluent than native speech, and differs from it phonetically. While the speech of some L2 English speakers seems to be easily understood by native listeners despite the presence of a foreign accent, other L2 speech seems to be more demanding, such that listeners must expend considerable effort in order to understand it. One reason for this increased difficulty may simply be the speaker’s pronunciation accuracy or phonetic intelligibility. If a L2 speaker’s pronunciations of English sounds differ sufficiently from the sounds that native listeners expect, these differences may force native listeners to work much harder to understand the divergent speech patterns. However, L2 speakers also tend to differ from native ones in terms of fluency – the degree to which a speaker is able to produce appropriately structured phrases without unnecessary pauses, self-corrections or restarts. Previous studies have shown that measures of fluency are strongly predictive of listeners’ subjective ratings of the acceptability of L2 speech: Less fluent speech is consistently considered less acceptable (Ginther, Dimova, & Yang, 2010). However, since less fluent speakers tend also to have less accurate pronunciations, it is unclear whether or how these factors might interact to influence the amount of effort listeners exert to understand L2 speech, nor is it clear how listening effort might relate to perceived quality or acceptability of speech. In this dissertation, two experiments were designed to investigate these questions

    The Processing of Accented Speech

    Get PDF
    This thesis examines the processing of accented speech in both infants and adults. Accents provide a natural and reasonably consistent form of inter-speaker variation in the speech signal, but it is not yet clear exactly what processes are used to normalise this form of variation, or when and how those processes develop. Two adult studies use ERP data to examine differences between the online processing of regional- and foreign-accented speech as compared to a baseline consisting of the listeners’ home accent. These studies demonstrate that the two types of accents recruit normalisation processes which are qualitatively, and not just quantitatively, different. This provided support for the hypothesis that foreign and regional accents require different mechanisms to normalise accent-based variation (Adank et al., 2009, Floccia et al., 2009), rather than for the hypothesis that different types of accents are normalised according to their perceptual distance from the listener’s own accent (Clarke & Garrett, 2004). They also provide support for the Abstract entry approach to lexical storage of variant forms, which suggests that variant forms undergo a process of prelexical normalisation, allowing access to a canonical lexical entry (Pallier et al., 2001), rather than for the Exemplar-based approach, which suggests that variant word-forms are individually represented in the lexicon (Johnson, 1997). Two further studies examined how infants segment words from continuous speech when presented with accented speakers. The first of these includes a set of behavioural experiments, which highlight some methodological issues in the existing literature and offer some potential explanations for conflicting evidence about the age at which infants are able to segment speech. The second uses ERP data to investigate segmentation within and across accents, and provides neurophysiological evidence that 11-month-olds are able to distinguish newly-segmented words at the auditory level even within a foreign accent, or across accents, but that they are more able to treat new word-forms as word-like in a familiar accent than a foreign accent

    Personalising synthetic voices for individuals with severe speech impairment.

    Get PDF
    Speech technology can help individuals with speech disorders to interact more easily. Many individuals with severe speech impairment, due to conditions such as Parkinson's disease or motor neurone disease, use voice output communication aids (VOCAs), which have synthesised or pre-recorded voice output. This voice output effectively becomes the voice of the individual and should therefore represent the user accurately. Currently available personalisation of speech synthesis techniques require a large amount of data input, which is difficult to produce for individuals with severe speech impairment. These techniques also do not provide a solution for those individuals whose voices have begun to show the effects of dysarthria. The thesis shows that Hidden Markov Model (HMM)-based speech synthesis is a promising approach for 'voice banking' for individuals before their condition causes deterioration of the speech and once deterioration has begun. Data input requirements for building personalised voices with this technique using human listener judgement evaluation is investigated. It shows that 100 sentences is the minimum required to build a significantly different voice from an average voice model and show some resemblance to the target speaker. This amount depends on the speaker and the average model used. A neural network analysis trained on extracted acoustic features revealed that spectral features had the most influence for predicting human listener judgements of similarity of synthesised speech to a target speaker. Accuracy of prediction significantly improves if other acoustic features are introduced and combined non-linearly. These results were used to inform the reconstruction of personalised synthetic voices for speakers whose voices had begun to show the effects of their conditions. Using HMM-based synthesis, personalised synthetic voices were built using dysarthric speech showing similarity to target speakers without recreating the impairment in the synthesised speech output

    ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSON’S DISEASE

    Get PDF
    Previous research has identified certain overarching features of hypokinetic dysarthria associated with Parkinson’s Disease and found it manifests differently between individuals. Acoustic analysis has often been used to find correlates of perceptual features for differential diagnosis. However, acoustic parameters that are robust for differential diagnosis may not be sensitive to tracking speech changes. Previous longitudinal studies have had limited sample sizes or variable lengths between data collection. This study focused on using acoustic correlates of perceptual features to identify acoustic markers able to track speech changes in people with Parkinson’s Disease (PwPD) over six months. The thesis presents how this study has addressed limitations of previous studies to make a novel contribution to current knowledge. Speech data was collected from 63 PwPD and 47 control speakers using an online podcast software at two time points, six months apart (T1 and T2). Recordings of a standard reading passage, minimal pairs, sustained phonation, and spontaneous speech were collected. Perceptual severity ratings were given by two speech and language therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody were investigated. Two analyses were conducted: a) to identify which acoustic parameters can track perceptual speech changes over time and b) to identify which acoustic parameters can track changes in speech intelligibility over time. An additional attempt was made to identify if these parameters showed group differences for differential diagnosis between PwPD and control speakers at T1 and T2. Results showed that specific acoustic parameters in voice quality, articulation and prosody could differentiate between PwPD and controls, or detect speech changes between T1 and T2, but not both factors. However, specific acoustic parameters within articulation could detect significant group and speech change differences across T1 and T2. The thesis discusses these results, their implications, and the potential for future studies

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    An inquiry into the typical and atypical language development of young transnational multilingual children in an international school

    Get PDF
    This PhD thesis investigates some of the unique characteristics of young transnational multilingual children aged five to eleven from high-socioeconomic status families educated in an international school in Switzerland. Its purpose is to improve understanding of typical and atypical language development for this group. It draws on sociolinguistic research on language variation and exposure, and clinical linguistic research on developmental language disorder identification and cross-linguistic considerations. The specific aim of the pilot research study presented in this thesis is to measure and discuss seven multilingual children’s verbal language abilities in each of their languages, and to measure their combined bilingual verbal abilities and multilingual verbal abilities. It is, therefore, influenced by discussion on language acquisition theories that relate to complex and dynamic systems, such as the Dynamic Model of Multilingualism. In addition, it also identifies any common characteristics, familial language practices or experiences of the pilot group of children. A methodological design is created that could be replicated in the future on a much larger scale as a means of confirming, extending or disputing the findings from the pilot group. This thesis’s pilot research findings suggest that multilingual children from high-income families who attend international schools have significantly above average verbal language abilities when their verbal language abilities are evaluated as one total language system (multilingual ability), a finding that is in stark contrast to the ‘average’ results they receive when each language is evaluated on its own. The thesis concludes that research on multilingual children that does not take into account the variables unique to this group may fail to recognise important factors that can impact their language development

    Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum

    Get PDF

    Attention Restraint, Working Memory Capacity, and Mind Wandering: Do Emotional Valence or Intentionality Matter?

    Get PDF
    Attention restraint appears to mediate the relationship between working memory capacity (WMC) and mind wandering (Kane et al., 2016). Prior work has identifed two dimensions of mind wandering—emotional valence and intentionality. However, less is known about how WMC and attention restraint correlate with these dimensions. Te current study examined the relationship between WMC, attention restraint, and mind wandering by emotional valence and intentionality. A confrmatory factor analysis demonstrated that WMC and attention restraint were strongly correlated, but only attention restraint was related to overall mind wandering, consistent with prior fndings. However, when examining the emotional valence of mind wandering, attention restraint and WMC were related to negatively and positively valenced, but not neutral, mind wandering. Attention restraint was also related to intentional but not unintentional mind wandering. Tese results suggest that WMC and attention restraint predict some, but not all, types of mind wandering
    corecore