264 research outputs found

    Fast Speech in Unit Selection Speech Synthesis

    Get PDF
    Moers-Prinz D. Fast Speech in Unit Selection Speech Synthesis. Bielefeld: Universität Bielefeld; 2020.Speech synthesis is part of the everyday life of many people with severe visual disabilities. For those who are reliant on assistive speech technology the possibility to choose a fast speaking rate is reported to be essential. But also expressive speech synthesis and other spoken language interfaces may require an integration of fast speech. Architectures like formant or diphone synthesis are able to produce synthetic speech at fast speech rates, but the generated speech does not sound very natural. Unit selection synthesis systems, however, are capable of delivering more natural output. Nevertheless, fast speech has not been adequately implemented into such systems to date. Thus, the goal of the work presented here was to determine an optimal strategy for modeling fast speech in unit selection speech synthesis to provide potential users with a more natural sounding alternative for fast speech output

    Auditory comprehension: from the voice up to the single word level

    Get PDF
    Auditory comprehension, the ability to understand spoken language, consists of a number of different auditory processing skills. In the five studies presented in this thesis I investigated both intact and impaired auditory comprehension at different levels: voice versus phoneme perception, as well as single word auditory comprehension in terms of phonemic and semantic content. In the first study, using sounds from different continua of ‘male’-/pæ/ to ‘female’-/tæ/ and ‘male’-/tæ/ to ‘female’-/pæ/, healthy participants (n=18) showed that phonemes are categorised faster than voice, in contradistinction with the common hypothesis that voice information is stripped away (or normalised) to access phonemic content. Furthermore, reverse correlation analysis suggests that gender and phoneme are processed on the basis of different perceptual representations. A follow-up study (same paradigm) in stroke patients (n=25, right or left hemispheric brain lesions, both with and without aphasia) showed that lesions of the right frontal cortex (likely ventral inferior frontal gyrus) leads to systematic voice perception deficits while left hemispheric lesions can elicit both voice and phoneme deficits. Together these results show that phoneme processing is lateralized while voice information processing requires both hemispheres. Furthermore, this suggests that commencing Speech and Language Therapy at a low level of acoustic processing/voice perception may be an appropriate method in the treatment of phoneme perception impairments. A longitudinal case study (CF) of crossed aphasia (rare acquired communication impairment secondary to lesion ipsilateral to the dominant hand) is then presented alongside a mini-review of the literature. Extensive clinical investigation showed that CF presented with word-finding difficulties related to impaired auditory phonological analysis, while functional Magnetic Resonance Imaging (fMRI) analyses showed right hemispheric lateralization of language functions (reading, repetition and verb generation). These results, together with the co-morbidity analysis from the mini-review, suggest that crossed aphasia can be explained by developmental disorders which cause partial right lateralization shift of language processes. Interestingly, in CF this process did not affect voice lateralization and information processing, suggesting partial segregation of voice and speech processing. In the last two studies, auditory comprehension was examined at the single word level using a word-picture matching task with congruent (correct target) and incongruent (semantic, phonological and unrelated foils) conditions. fMRI in healthy participants (n=16) revealed a key role of the pars triangularis (phonological processing), the left angular gyrus (semantic incongruency) and the left precuneus (semantic relatedness) in this task – regions typically associated via the arcuate fasciculus and often impaired in aphasia. Further investigation of stroke patients on the same task (n=15) suggested that the connections between the angular gyrus and the pars triangularis serve a fundamental role in semantic processing. The quality of a published word-picture matching task was also investigated, with results questioning the clinical relevance of this task as an assessment tool. Finally, a pilot study looking at the effect of a computer-assisted auditory comprehension therapy (React2©) in 6 stroke patients (vs. 6 healthy controls and 6 stroke patients without therapy) is presented. Results show that the more therapy patients carry out the more improvement is seen in the semantic processing of single nouns. However, these results need to be reproduced on a larger scale in order to generalise any outcomes. Overall, the findings from these studies present new insight into, as well as extending on, current cognitive and neuroanatomical models of voice perception, speech perception and single word auditory comprehension. A combinatorial approach to cognitive and neuroanatomical models is proposed in order to further research, and thus improve clinical care, into impaired auditory comprehension

    Semantic radical consistency and character transparency effects in Chinese: an ERP study

    Get PDF
    BACKGROUND: This event-related potential (ERP) study aims to investigate the representation and temporal dynamics of Chinese orthography-to-semantics mappings by simultaneously manipulating character transparency and semantic radical consistency. Character components, referred to as radicals, make up the building blocks used dur...postprin

    The Immediacy Of Linguistic Computation

    Get PDF
    This dissertation investigates the wide-ranging implications of a simple fact: language unfolds over time. Whether as cognitive symbols in our minds, or as their physical realization in the world, if linguistic computations are not made over transient and shifting information as it occurs, they cannot be made at all. This dissertation explores the interaction between the computations, mechanisms, and representations of language acquisition and language processing—with a central theme being the unique study of the temporal restrictions inherent to information processing that I term the immediacy of linguistic computation. This program motivates the study of intermediate representations recruited during online processing and acquisition rather than simply an Input/Output mapping. While ultimately extracted from linguistic input, such intermediate representations may differ significantly from the underlying distributional signal. I demonstrate that, due to the immediacy of linguistic computation, such intermediate representations are necessary, discoverable, and offer an explanatory connection between competence (linguistic representation) and performance (psycholinguistic behavior). The dissertation is comprised of four case studies. First, I present experimental evidence from a perceptual learning paradigm that the intermediate representation of speech consists of probabilistic activation over discrete linguistic categories but includes no direct information about the original acoustic-phonetic signal. Second, I present a computational model of word learning grounded in category formation. Instead of retaining experiential statistics over words and all their potential meanings, my model constructs hypotheses for word meanings as they occur. Uses of the same word are evaluated (and revised) with respect to the learner\u27s intermediate representation rather than to their complete distribution of experience. In the third case study, I probe predictions about the time-course, content, and structure of these intermediate representations of meaning via a new eye-tracking paradigm. Finally, the fourth case study uses large-scale corpus data to explore syntactic choices during language production. I demonstrate how a mechanistic account of production can give rise to highly efficient outcomes even without explicit optimization. Taken together these case studies represent a rich analysis of the immediacy of linguistic computation and its system-wide impact on the mental representations and cognitive algorithms of language

    The Role of Prosodic Stress and Speech Perturbation on the Temporal Synchronization of Speech and Deictic Gestures

    Get PDF
    Gestures and speech converge during spoken language production. Although the temporal relationship of gestures and speech is thought to depend upon factors such as prosodic stress and word onset, the effects of controlled alterations in the speech signal upon the degree of synchrony between manual gestures and speech is uncertain. Thus, the precise nature of the interactive mechanism of speech-gesture production, or lack thereof, is not agreed upon or even frequently postulated. In Experiment 1, syllable position and contrastive stress were manipulated during sentence production to investigate the synchronization of speech and pointing gestures. An additional aim of Experiment 2 was to investigate the temporal relationship of speech and pointing gestures when speech is perturbed with delayed auditory feedback (DAF). Comparisons between the time of gesture apex and vowel midpoint (GA-VM) for each of the conditions were made for both Experiment 1 and Experiment 2. Additional comparisons of the interval between gesture launch midpoint to vowel midpoint (GLM-VM), total gesture time, gesture launch time, and gesture return time were made for Experiment 2. The results for the first experiment indicated that gestures were more synchronized with first position syllables and neutral syllables as measured GA-VM intervals. The first position syllable effect was also found in the second experiment. However, the results from Experiment 2 supported an effect of contrastive pitch effect. GLM-VM was shorter for first position targets and accented syllables. In addition, gesture launch times and total gesture times were longer for contrastive pitch accented syllables, especially when in the second position of words. Contrary to the predictions, significantly longer GA-VM and GLM-VM intervals were observed when individuals responded under provided delayed auditory feedback (DAF). Vowel and sentence durations increased both with (DAF) and when a contrastive accented syllable was produced. Vowels were longest for accented, second position syllables. These findings provide evidence that the timing of gesture is adjusted based upon manipulations of the speech stream. A potential mechanism of entrainment of the speech and gesture system is offered as an explanation for the observed effects

    From sequences to cognitive structures : neurocomputational mechanisms

    Get PDF
    Ph. D. Thesis.Understanding how the brain forms representations of structured information distributed in time is a challenging neuroscientific endeavour, necessitating computationally and neurobiologically informed study. Human neuroimaging evidence demonstrates engagement of a fronto-temporal network, including ventrolateral prefrontal cortex (vlPFC), during language comprehension. Corresponding regions are engaged when processing dependencies between word-like items in Artificial Grammar (AG) paradigms. However, the neurocomputations supporting dependency processing and sequential structure-building are poorly understood. This work aimed to clarify these processes in humans, integrating behavioural, electrophysiological and computational evidence. I devised a novel auditory AG task to assess simultaneous learning of dependencies between adjacent and non-adjacent items, incorporating learning aids including prosody, feedback, delineated sequence boundaries, staged pre-exposure, and variable intervening items. Behavioural data obtained in 50 healthy adults revealed strongly bimodal performance despite these cues. Notably, however, reaction times revealed sensitivity to the grammar even in low performers. Behavioural and intracranial electrode data was subsequently obtained in 12 neurosurgical patients performing this task. Despite chance behavioural performance, time- and time-frequency domain electrophysiological analysis revealed selective responsiveness to sequence grammaticality in regions including vlPFC. I developed a novel neurocomputational model (VS-BIND: “Vector-symbolic Sequencing of Binding INstantiating Dependencies”), triangulating evidence to clarify putative mechanisms in the fronto-temporal language network. I then undertook multivariate analyses on the AG task neural data, revealing responses compatible with the presence of ordinal codes in vlPFC, consistent with VS-BIND. I also developed a novel method of causal analysis on multivariate patterns, representational Granger causality, capable of detecting flow of distinct representations within the brain. This alluded to top-down transmission of syntactic predictions during the AG task, from vlPFC to auditory cortex, largely in the opposite direction to stimulus encodings, consistent with predictive coding accounts. It finally suggested roles for the temporoparietal junction and frontal operculum during grammaticality processing, congruent with prior literature. This work provides novel insights into the neurocomputational basis of cognitive structure-building, generating hypotheses for future study, and potentially contributing to AI and translational efforts.Wellcome Trust, European Research Counci

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)
    corecore