12 research outputs found

    Speech coding at medium bit rates using analysis by synthesis techniques

    Get PDF
    Speech coding at medium bit rates using analysis by synthesis technique

    Analysis and correction of the helium speech effect by autoregressive signal processing

    Get PDF
    SIGLELD:D48902/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Theory, design and application of gradient adaptive lattice filters

    Get PDF
    SIGLELD:D48933/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Perceptual models in speech quality assessment and coding

    Get PDF
    The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.

    Using fMRI and Behavioural Measures to Investigate Rehabilitation in Post-Stroke Aphasic Deficits

    Get PDF
    In this thesis I investigated whether an intensive computerised, home-based therapy programme could improve phonological discrimination ability in 19 patients with chronic post-stroke aphasia. One skill specifically targeted by the treatment demonstrated an improvement due to the therapy. However, this improvement did not generalise to untreated items, and was only effective for participants without a lesion involving the frontal lobe, indicating a potentially important role for this region in determining outcome of aphasia therapy. Complementary functional imaging studies investigated activity in domain-general and domain-specific networks in both patients and healthy volunteers during listening and repeating simple sentences. One important consideration when comparing a patient group with a healthy population is the difference in task difficulty encountered by the two groups. Increased cognitive effort can be expected to increase activity in domain-general networks. I minimised the effect of this confound by manipulating task difficulty for the healthy volunteers to reduce their behavioural performance so that it was comparable to that of the patients. By this means I demonstrated that the activation patterns in domain-general regions were very similar in the two groups. Region-of-interest analysis demonstrated that activity within a domain-general network, the salience network, predicted residual language function in the patients with aphasia, even after accounting for lesion volume and their chronological age. I drew two broad conclusions from these studies. First, that computer-based rehabilitation can improve disordered phonological discrimination in chronic aphasia, but that lesion distribution may influence the response to this training. Second, that the ability to activate domain-general cognitive control regions influences outcome in aphasia. This allows me to propose that in future work, therapeutic strategies, pharmacological or behavioural, targeting domain-general brain systems, may benefit aphasic stroke rehabilitation.Open Acces

    Intelligibility enhancement of synthetic speech in noise

    Get PDF
    EC Seventh Framework Programme (FP7/2007-2013)Speech technology can facilitate human-machine interaction and create new communication interfaces. Text-To-Speech (TTS) systems provide speech output for dialogue, notification and reading applications as well as personalized voices for people that have lost the use of their own. TTS systems are built to produce synthetic voices that should sound as natural, expressive and intelligible as possible and if necessary be similar to a particular speaker. Although naturalness is an important requirement, providing the correct information in adverse conditions can be crucial to certain applications. Speech that adapts or reacts to different listening conditions can in turn be more expressive and natural. In this work we focus on enhancing the intelligibility of TTS voices in additive noise. For that we adopt the statistical parametric paradigm for TTS in the shape of a hidden Markov model (HMM-) based speech synthesis system that allows for flexible enhancement strategies. Little is known about which human speech production mechanisms actually increase intelligibility in noise and how the choice of mechanism relates to noise type, so we approached the problem from another perspective: using mathematical models for hearing speech in noise. To find which models are better at predicting intelligibility of TTS in noise we performed listening evaluations to collect subjective intelligibility scores which we then compared to the models’ predictions. In these evaluations we observed that modifications performed on the spectral envelope of speech can increase intelligibility significantly, particularly if the strength of the modification depends on the noise and its level. We used these findings to inform the decision of which of the models to use when automatically modifying the spectral envelope of the speech according to the noise. We devised two methods, both involving cepstral coefficient modifications. The first was applied during extraction while training the acoustic models and the other when generating a voice using pre-trained TTS models. The latter has the advantage of being able to address fluctuating noise. To increase intelligibility of synthetic speech at generation time we proposed a method for Mel cepstral coefficient modification based on the glimpse proportion measure, the most promising of the models of speech intelligibility that we evaluated. An extensive series of listening experiments demonstrated that this method brings significant intelligibility gains to TTS voices while not requiring additional recordings of clear or Lombard speech. To further improve intelligibility we combined our method with noise-independent enhancement approaches based on the acoustics of highly intelligible speech. This combined solution was as effective for stationary noise as for the challenging competing speaker scenario, obtaining up to 4dB of equivalent intensity gain. Finally, we proposed an extension to the speech enhancement paradigm to account for not only energetic masking of signals but also for linguistic confusability of words in sentences. We found that word level confusability, a challenging value to predict, can be used as an additional prior to increase intelligibility even for simple enhancement methods like energy reallocation between words. These findings motivate further research into solutions that can tackle the effect of energetic masking on the auditory system as well as on higher levels of processing
    corecore