676 research outputs found
SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers.
In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range.
To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems
A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications
Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems
Recommended from our members
Exploring the role of prosody in passage reading of experienced and early readers
Previous research has consistently demonstrated that individual differences in prosodic competence (i.e., an individual’s sensitivity to and awareness of prosodic cues) are positively associated with reading comprehension (e.g., Chung & Bidelman, 2021; Holliman, Williams, et al., 2014; Lochrin et al., 2015; Veenendaal et al., 2014). It is less clear, however, whether this relationship between prosodic competence and reading comprehension is simply due to the role of prosody in the many lower level skills involved in efficient word reading, or, whether well-developed prosodic competence facilitates reading comprehension at a higher level. Accordingly, one of the hypotheses proposed in this thesis is that prosodic competence facilitates reading comprehension at the passage level by allowing for prosodic passage reading (i.e., the ability to read a passage with appropriate prosody).
This thesis describes three empirical studies designed to examine the concurrent relationships between prosodic competence, prosodic passage reading, and reading comprehension in two samples of participants: experienced readers (adults) and early readers (children ages 7- to 11-years-old). Specifically, analyses were used to investigate (a) whether performance on prosodic competence tasks explained unique variance in passage reading (prosodic reading and comprehension) after accounting for word-level reading skills (e.g., vocabulary, segmental PA, and single word reading), (b) whether prosodic passage reading ability explained unique variance in reading comprehension, after accounting for word-level reading skills, and (c) whether prosodic passage reading ability explained the contribution of prosodic competence to reading comprehension.
Results demonstrated that prosodic competence did not account for additional variance in reading comprehension, after controlling for word-level reading skills in either sample of readers. Consequently, there was no evidence that prosodic passage reading mediated the relationship between prosodic competence and reading comprehension. However, results did reveal that the role of prosody in relation to passage reading was markedly different between experienced and early readers. To illustrate, prosodic competence accounted for unique variance in prosodic passage reading (after accounting for all word-level reading skills), but exclusively in the samples of experienced readers — suggesting that prosodic competence likely facilitates prosodic passage reading, but only after a certain level of reading efficiency has been achieved. On the other hand, prosodic passage reading accounted for unique variance in reading comprehension (after accounting for all word-level reading skills), exclusively in the sample of early readers — suggesting that prosodic passage reading likely acts as a comprehension tool, but only during reading development. Accordingly, I argue that prosody should be integrated into future frameworks of reading comprehension, but that a developmental approach, which considers the changing role of prosody, is necessary. I also maintain that these results support the incorporation of prosodic passage reading in early literacy curricula
Effect of Music Integrated Instruction on First Graders\u27 Reading Fluency
The study examined music-integrated (MI) instruction, framed by automatic information processing theory and elements of prosody. A quasi-experimental, pre- and posttest design was utilized to ascertain the effect of MI instruction on reading fluency among first grade students. Subjects were students in two public elementary schools in Georgia. To determine the effect of MI instruction on reading fluency scores, independent samples t-tests were employed to compare students\u27 Dynamic Indicators of Basic Literacy Skills (DIBELS) test scores. Analysis revealed to what degree MI instruction in reading had effect upon two DIBELS indicators, specifically nonsense word fluency (NWF) and phoneme segmentation fluency (PSF) scores. Researching the application of MI instruction to the teaching of reading establishes its potential impact upon academic rigor and pedagogical creativity
Negative vaccine voices in Swedish social media
Vaccinations are one of the most significant interventions to public health, but vaccine hesitancy creates concerns for a portion of the population in many countries, including Sweden. Since discussions on vaccine hesitancy are often taken on social networking sites, data from Swedish social media are used to study and quantify the sentiment among the discussants on the vaccination-or-not topic during phases of the COVID-19 pandemic. Out of all the posts analyzed a majority showed a stronger negative sentiment, prevailing throughout the whole of the examined period, with some spikes or jumps due to the occurrence of certain vaccine-related events distinguishable in the results. Sentiment analysis can be a valuable tool to track public opinions regarding the use, efficacy, safety, and importance of vaccination
Does speech prosody matter in health communication? Evidence from native and non-native English speaking medical students in a simulated clinical interaction
The impact of the UK’s multilingual and multicultural society today can be seen in its healthcare services and have contributed towards shaping communication skills training as a core part of the UK undergraduate medical curriculum. NHS complaints statistics involving perceived staff attitudes have remained high, despite extensive communication skills training. Furthermore, foreign doctors have received a higher proportion of complaints than UK doctors. Finally, how linguistic and social factors shape the conveyance and perception of attitudes related to professionalism in medical communication remains poorly understood.
The ultimate aim of this study was to ascertain if speech prosody contributes to the perception of professionalism in medical communication. Research questions on the role of speech prosody in conveying professional attitudes in medical communication, the prosodic differences between native and non-native English speaking medical students in a simulated clinical interaction, and the influence of prosodic features on listeners’ perceptions of professional attitudes were addressed.
A set of acoustic parameters representing the speech prosody of native and non-native medical students in the simulated clinical setting was analysed. A perceptual experiment was then carried out to investigate the factors affecting perceived professionalism in extracts of the analysed simulated clinical interaction.
The examined acoustic parameters were found to be sensitive to the English language background and the task within the simulated consultation. Interestingly, the attitudinal information associated with some of these acoustic parameters were perceived by listeners and were reflected by higher professional scale scores in the perceptual experiment, even after adjusting for the English language background. The factors of training level and consultation task also emerged to be affecting professional scale scores.
Initial findings have confirmed that speech prosody plays a role in terms of contributing towards the perception of professionalism in medical communication. Incorporating how messages are delivered to patients into current models of communication skills training may have positive outcomes
The effect of a music intervention on the temporal organisation of reading skills
This study investigated the reading behaviour of school children following participation\ud
in a rhythm-based music intervention. The investigation was inspired by pupils' progress\ud
in music lessons after using the rhythm-based music intervention. Little empirical work\ud
has been done on metre and learning. This project has focused upon 'temporal regulation'\ud
and 'temporal integration' as a possible learning pathway linking the music intervention,\ud
as an entrainment activity to reading behaviour. The theoretical framework draws upon\ud
multi-disciplinary areas of literature to converge on metre as an organisational feature\ud
common to music and language.\ud
The methodology of this small scale research project involved three stages. First, three\ud
empirical explorations of the music intervention used a mixed experimental design. The\ud
randomly selected participants were school children, 8-10 years of age. Secondly, a\ud
small, randomly selected sample of school children with below average capability in\ud
reading comprehension or reading fluency, took part in a two-treatment experimental\ud
design comparing the music intervention and a phonics intervention. The third stage, a\ud
trial in two schools, investigated whether the effects of the rhythm-based music\ud
intervention were sustained when the music intervention was directed by school staff.\ud
Although only small samples were involved, a consistent effect was found in gains in\ud
reading comprehension for below average capability readers, following participation in\ud
the music intervention. In the two-treatment design, positive effects were found for rate\ud
of reading, reading comprehension and phonological discrimination but not for reading\ud
accuracy. In the trial in two schools, effects were found for reading comprehension,\ud
reading accuracy in both schools and rate of reading in one school suggesting that the\ud
music intervention may be suitable for use as part of the music or the literacy programme\ud
in schools. Overall the data suggested that the rhythm-based music intervention had a\ud
positive effect on children's reading behaviour
- …