9 research outputs found
Statistical parametric speech synthesis using conversational data and phenomena
Statistical parametric text-to-speech synthesis currently relies on predefined and highly
controlled prompts read in a âneutralâ voice. This thesis presents work on utilising
recordings of free conversation for the purpose of filled pause synthesis and as an
inspiration for improved general modelling of speech for text-to-speech synthesis purposes.
A corpus of both standard prompts and free conversation is presented and the
potential usefulness of conversational speech as the basis for text-to-speech voices
is validated. Additionally, through psycholinguistic experimentation it is shown that
filled pauses can have potential subconscious benefits to the listener but that current
text-to-speech voices cannot replicate these effects. A method for pronunciation variant
forced alignment is presented in order to obtain a more accurate automatic speech
segmentation something which is particularly bad for spontaneously produced speech.
This pronunciation variant alignment is utilised not only to create a more accurate underlying
acoustic model, but also as the driving force behind creating more natural
pronunciation prediction at synthesis time. While this improves both the standard and
spontaneous voices the naturalness of spontaneous speech based voices still lags behind
the quality of voices based on standard read prompts. Thus, the synthesis of filled
pauses is investigated in relation to specific phonetic modelling of filled pauses and
through techniques for the mixing of standard prompts with spontaneous utterances in
order to retain the higher quality of standard speech based voices while still utilising
the spontaneous speech for filled pause modelling. A method for predicting where to
insert filled pauses in the speech stream is also developed and presented, relying on
an analysis of human filled pause usage and a mix of language modelling methods.
The method achieves an insertion accuracy in close agreement with human usage. The
various approaches are evaluated and their improvements documented throughout the
thesis, however, at the end the resulting filled pause quality is assessed through a repetition
of the psycholinguistic experiments and an evaluation of the compilation of all
developed methods
An investigation into interactional patterns for Alzheimer's Disease recognition in Natural dialogues
Alzheimer's disease (AD) is a complex neurodegenerative disorder characterized by memory loss, together with cognitive deficits affecting language, emotional affect, and interactional communication. Diagnosis and assessment of AD is formally based on the judgment of clinicians, commonly using semi-structured interviews in a clinical setting. Manual diagnosis is therefore slow, resource-heavy, and hard to access, so many people don't get diagnosed - and therefore using some kind of automatic method would help. Using the most recent advances in deep learning, machine learning, and natural language processing, this thesis empirically explores how content-free, interaction patterns are helpful in developing models capable of identifying AD from natural conversations with a focus on particular phenomena found useful in conversational analysis studies. The models presented in this thesis use lexical, disfluency, interactional, acoustic, and pause information to learn the symptoms of Alzheimer's disease from text and audio modalities. This thesis comprises two parts. In the first part, by studying a conversational corpus, we find there are certain phenomena that are really strongly indicative of differences between AD and Non-AD. This analysis shows that interaction patterns are different between an AD patient and a Non-AD patient, including types of questions asked from patients, their responses, delay in responses in the form of pauses, clarification questions, signaling non-understanding, and repetition of questions. Although it is a challenging problem due to the fact that these dialogue acts are so rare, we show that it is possible to develop models that can automatically detect these classes. The second part then shifts to look at AD diagnosis itself by looking into interactional features including pause information, disfluencies within patients speech, communication breakdowns at speaker changes in certain situations, Ngram dialogue act sequences. We found out that there are longer pauses within the AD patients utterances and more attributable silences in response to questions as compared to Non-AD patients. It also showed that using different fusion techniques with speech and text modality has maximise the combination and use of different feature sets showing that these features/techniques can give quite good accurate and effective AD diagnosis. These interaction patterns may serve as an index of internal cognitive processes that help in differentiating AD patients and Non-AD patients and may be used as an integral part of language assessment in clinical settings
IberSPEECH 2020: XI Jornadas en TecnologĂa del Habla and VII Iberian SLTech
IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, âIberSPEECH 2020: Speech and Language Technologies for Iberian Languagesâ, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de TecnologĂas del Habla. Universidad de Valladoli