100 research outputs found

    Lexical stress and lexical access: effects in read and spontaneous speech

    Get PDF
    This thesis examines three issues which are of importance in the study of auditory word recognition: the phonological unit which is used to access representations in the mental lexicon; the extent to which hearers can rely on words being identified before their acoustic offsets; and the role of context in auditory word recognition. Three hypotheses which are based on the predictions of the Cohort Model (Marslen-Wilson and Tyler 1980) are tested experimentally using the gating paradigm. First, the phonological access hypothesis claims that word onsets, rather than any other part of the word, are used to access representations in the mental lexicon. An alternative candidate which has been proposed as the initiator of lexical access is the stressed syllable. Second, the early recognition hypothesis states that polysyllabic words, and the majority of words heard in context, will be recognised before their acoustic offsets. Finally, the context-free hypothesis predicts that during the initial stages of the processing of words, no effects of context will be discernible.Experiment 1 tests all three predictions by manipulating aspects of carefully articulated, read speech. First, examination of the gating responses from three context conditions offers no support for the context-free hypothesis. Second, the high number of words which are identified before their acoustic offsets is consistent with the early recognition hypothesis. Finally, the phonological access hypothesis is tested by manipulation of the stress patterns of stimuli. The dependent variables which are examined relate to the processes of lexical access and lexical retrieval; stress differences are found on access measures but not on those relating to retrieval. When the experiment is replicated with a group of subjects whose level of literacy is lower than that of the undergraduates who took part in the original experiment, differences are found in measures relating to contextual processing.Experiment 2 continues to examine the phonological access hypothesis, by manipulating speech style (read versus conversational) as well as stress pattern. Gated words, excised from the speech of six speakers, are presented in isolation. Words excised from read speech and words stressed on the first syllable elicit a greater number of responses which match the stimuli than conversational tokens and words with unstressed initial syllables. Intelligibility differences among the four conditions are also reported.Experiment 3 aims to investigate the processing of read and spontaneous tokens heard in context, while maintaining the manipulation of stress pattern. A subset of the words from Experiment 2 are presented in their original sentence contexts: the test words themselves, plus up to three subsequent words, are gated. Although the presence of preceding context generally enhances intelligibility, some words remain unrecognised by the end of the third subsequent word. An interaction between stress and speech style may be explained in terms of the unintelligibility of the preceding context.Several issues arising from Experiments 1, 2 and 3 are considered further. The characteristics of words which fail to be recognised before their offsets are examined using the statistical technique of regression; the contributions of phonetic and phonological aspects of stressed syllables are assessed; and a further experiment is reported which explores top-down processing in spontaneous speech, and which offers support for the interpretation of the results of Experiment 3 offered earlier

    Single- and multi-microphone speech dereverberation using spectral enhancement

    Get PDF
    In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences. This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems. In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones. The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). Reverberant speech can be described as sounding distant with noticeable echo and colouration. These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone. Conversely, early reverberations tend to improve the intelligibility of speech. In combination with the direct sound it is sometimes referred to as the early speech component. Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation. More specifically the dissertation deals with dereverberation techniques, i.e., signal processing techniques to reduce the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., at estimation of the early speech component. This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal. This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s). In our work an existing single-channel statistical reverberation model serves as a starting point. The model is characterized by one parameter that depends on the acoustic characteristics of the environment. We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance. This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power. A generalization of the statistical reverberation model in which the direct sound is incorporated is developed. This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections. The generalized model is used to derive a novel spectral variance estimator. When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased. Single-microphone systems only exploit the temporal and spectral diversity of the received signal. Reverberation, of course, also induces spatial diversity. To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer. It is not a priori evident whether spectral enhancement is best done before or after the spatial processor. For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique. An advantage of the latter option is that the spectral variance estimator can be further improved. Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality. The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system. Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker. Usually an acoustic echo canceller is used to cancel the far-end echo. Additionally a post-processor is used to suppress background noise and residual echo, i.e., echo which could not be cancelled by the echo canceller. In this work a novel structure and post-processor for an acoustic echo canceller are developed. The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise. The late reverberation and late residual echo are estimated using the generalized statistical reverberation model. Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise. The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility

    Intelligibility of speech addressed to children

    Get PDF
    SIGLEAvailable from British Library Document Supply Centre- DSC:D44476/83 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Phonological reduction and intelligibility in task-oriented dialogue

    Get PDF

    Aspekte der Charakterisierung phonologischer Sprachstörungen vs. verzögerter Spracherwerb bei jordanischem Arabisch sprechenden Kindern

    Get PDF
    Bader S'da SI. Issues in the characterisation of phonological speech impairment vs. delayed acquisition in Jordanian Arabic-Speaking children. Bielefeld (Germany): Bielefeld University; 2010.Eine Studie des Spracherwerbs des jordanischen Arabisch bei jungen Muttersprachlern.A study with children speaking or acquiring Jordanian Arabic with or without phonological impairments

    The Potential Use of Slow-down Technology to Improve Pronounciation of English for International Communication

    Get PDF
    The focus of this research is on oral communication between L1 (first language) and L2 (second language) English users - to determine whether an algorithm which slows down speech can increase the intelligibility of speech between interlocutors for EIC (English for International Communication). The slow-down facility is a CALL tool which slows down speech without tonal distortion. It allows English language learners more processing time to hear individual phonemes as produced in the stream of connected speech, to help them hear and produce phonemes more accurately and thus more intelligibly. The study involved five tests, all concerned with the intelligibility of English speech. The first test looked at L2:L2 English communication and levels of receptive intelligibility, while Tests 2 and 3 focused on testing the slow-down for receptive communication – to help L2 users to process speech by slowing it down and thus making the speech signal more accessible. Tests 4 and 5 changed focus, testing the slow-down speech tool as a means of enhancing the intelligibility of L2 speech production, namely individual phoneme production, as little research has been carried out in this area and phoneme discrimination can greatly increase the intelligibility of an L2 speaker’s pronunciation. Test 5, the main test, used a qualitative analysis of a pre- and post test and a number of questionnaires to assess subjects’ progress in developing intelligible English phoneme production across three groups: the Test Group, who used the slow-down speech tool, the Control Group, who undertook similar pronunciation training but without the application of the slow-down tool and the Non-Interference Group, who received no formal pronunciation training whatsoever. The study also ascertained and evaluated the effects of other variables on the learning process, such as length of time learning English, daily use of English, attitudes to accents, and so forth

    Context-aware speech synthesis: A human-inspired model for monitoring and adapting synthetic speech

    Get PDF
    The aim of this PhD thesis is to illustrate the development a computational model for speech synthesis, which mimics the behaviour of human speaker when they adapt their production to their communicative conditions. The PhD project was motivated by the observed differences between state-of-the- art synthesiser’s speech and human production. In particular, synthesiser outcome does not exhibit any adaptation to communicative context such as environmental disturbances, listener’s needs, or speech content meanings, as the human speech does. No evaluation is performed by standard synthesisers to check whether their production is suitable for the communication requirements. Inspired by Lindblom's Hyper and Hypo articulation theory (H&H) theory of speech production, the computational model of Hyper and Hypo articulation theory (C2H) is proposed. This novel computational model for automatic speech production is designed to monitor its outcome and to be able to control the effort involved in the synthetic speech generation. Speech transformations are based on the hypothesis that low-effort attractors for a human speech production system can be identified. Such acoustic configurations are close to minimum possible effort that a speaker can make in speech production. The interpolation/extrapolation along the key dimension of hypo/hyper-articulation can be motivated by energetic considerations of phonetic contrast. The complete reactive speech synthesis is enabled by adding a negative perception feedback loop to the speech production chain in order to constantly assess the communicative effectiveness of the proposed adaptation. The distance to the original communicative intents is the control signal that drives the speech transformations. A hidden Markov model (HMM)-based speech synthesiser along with the continuous adaptation of its statistical models is used to implement the C2H model. A standard version of the synthesis software does not allow for transformations of speech during the parameter generation. Therefore, the generation algorithm of one the most well-known speech synthesis frameworks, HMM/DNN-based speech synthesis framework (HTS), is modified. The short-time implementation of speech intelligibility index (SII), named extended speech intelligibility index (eSII), is also chosen as the main perception measure in the feedback loop to control the transformation. The effectiveness of the proposed model is tested by performing acoustic analysis, objective, and subjective evaluations. A key assessment is to measure the control of the speech clarity in noisy condition, and the similarities between the emerging modifications and human behaviour. Two objective scoring methods are used to assess the speech intelligibility of the implemented system: the speech intelligibility index (SII) and the index based upon the Dau measure (Dau). Results indicate that the intelligibility of C2H-generated speech can be continuously controlled. The effectiveness of reactive speech synthesis and of the phonetic contrast motivated transforms is confirmed by the acoustic and objective results. More precisely, in the maximum-strength hyper-articulation transformations, the improvement with respect to non-adapted speech is above 10% for all intelligibility indices and tested noise conditions

    Korean Americans as Speakers of English: the Acquisition of General and Regional Features

    Get PDF
    This dissertation addresses Korean Americans as speakers of English and as a unified speech community, exploring the nature and extent of sociolinguistic stratification of the English used by Korean Americans in Philadelphia. The acquisition of three linguistic features is investigated: word-medial /t/ flapping, the use of discourse markers, and the regional feature of Philadelphia short a. Statistical analyses examine these features for the effects of linguistic factors and social factors such as age, sex, occupation, age of arrival in the US, length of stay in the US, and English education. Age of arrival shows a very strong effect on flapping: immigrants who arrived in the US as children and US-born immigrants both showed a very high degree of flapping, while Korean-born adult immigrants acquired flapping to a much lesser degree. Style is also analyzed to determine whether speakers show variation along the formality continuum. In addition to production, the perceptual component of English use by the speakers is examined through a perception test. The perception test, administered to native English speakers, elicits judgments of English nativeness and ethnic identity of the Korean Americans. The results of the perception test are correlated with the production results of the linguistic features. In general, Korean Americans show varying degrees of acquisition of the three features according to sociolinguistic factors. Although the speakers exhibit stylistic variation, they have not acquired the Philadelphia dialectal feature of short a. The perception test reveals that English nativeness is accurately judged but that ethnic identification is problematic for listeners. The correlation of perception and production is positive in that an increase in the presence of the native linguistic features in the speech being judged is correlated with increased perception of the degree of English nativeness. The three features examined are not taught through formal explicit instruction to either native or non-native English speakers, which implies that speakers must engage in face-to-face interaction with native speakers in order to acquire these native speech community norms
    • …
    corecore