41 research outputs found

    New measurement techniques for the assessment of velopharyngeal function in cleft palate patients

    Get PDF
    The present day treatment of the cleft palate is very much a multi-disciplinary team approach calling upon the skills of plastic surgeon, orthodontist, maxillo-facial surgeon and speech therapist. The aesthetic result of modern plastic surgery on the lip and face is unquestionably successful; however the improvement to speech due to changes in velopharyngeal function as a result of surgery is not so readily agreed upon. It is generally acknowledged that cleft repair should be carried out as early as possible after birth followed by several years of developmental monitoring; however considerable debate relating to the surgical technique employed and the long term effect of surgery on speech development still abounds. This thesis undertakes to make a contribution to the debate of efficacy of cleft repair in relation to speech function in the following manner. Firstly a new instrument called a Nasal Resonometer has been specifically designed for use by speech therapists for the pre and post operative assessment of hyper- and hypo-nasal speech. Secondly a new measurement technique involving the computer assisted analysis of x-ray videofluoroscopy images of clinically significant aspects of velar function has been introduced. Several studies of patients attending the clef repair clinics over a three year period are presented. The correlations between objective Resonometer measurement, subjective speech therapist analysis, velopharyngeal function and surgical technique are examined. The extensive clinical use of the Nasal Resonometer and image analysis technique have proven to be a successful addition to routine cleft palate measurements. Further the application of these measurements in specific studies has led to a clearer understanding of the effect of cleft palate surgery and has highlighted future areas of research

    Determination of articulatory parameters from speech waveforms

    Get PDF
    Imperial Users onl

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    Nasality in automatic speaker verification

    Get PDF

    Phase-Distortion-Robust Voice-Source Analysis

    Get PDF
    This work concerns itself with the analysis of voiced speech signals, in particular the analysis of the glottal source signal. Following the source-filter theory of speech, the glottal signal is produced by the vibratory behaviour of the vocal folds and is modulated by the resonances of the vocal tract and radiation characteristic of the lips to form the speech signal. As it is thought that the glottal source signal contributes much of the non-linguistic and prosodical information to speech, it is useful to develop techniques which can estimate and parameterise this signal accurately. Because of vocal tract modulation, estimating the glottal source waveform from the speech signal is a blind deconvolution problem which necessarily makes assumptions about the characteristics of both the glottal source and vocal tract. A common assumption is that the glottal signal and/or vocal tract can be approximated by a parametric model. Other assumptions include the causality of the speech signal: the vocal tract is assumed to be a minimum phase system while the glottal source is assumed to exhibit mixed phase characteristics. However, as the literature review within this thesis will show, the error criteria utilised to determine the parameters are not robust to the conditions under which the speech signal is recorded, and are particularly degraded in the common scenario where low frequency phase distortion is introduced. Those that are robust to this type of distortion are not well suited to the analysis of real-world signals. This research proposes a voice-source estimation and parameterisation technique, called the Power-spectrum-based determination of the Rd parameter (PowRd) method. Illustrated by theory and demonstrated by experiment, the new technique is robust to the time placement of the analysis frame and phase issues that are generally encountered during recording. The method assumes that the derivative glottal flow signal is approximated by the transformed Liljencrants-Fant model and that the vocal tract can be represented by an all-pole filter. Unlike many existing glottal source estimation methods, the PowRd method employs a new error criterion to optimise the parameters which is also suitable to determine the optimal vocal-tract filter order. In addition to the issue of glottal source parameterisation, nonlinear phase recording conditions can also adversely affect the results of other speech processing tasks such as the estimation of the instant of glottal closure. In this thesis, a new glottal closing instant estimation algorithm is proposed which incorporates elements from the state-of-the-art techniques and is specifically designed for operation upon speech recorded under nonlinear phase conditions. The new method, called the Fundamental RESidual Search or FRESS algorithm, is shown to estimate the glottal closing instant of voiced speech with superior precision and comparable accuracy as other existing methods over a large database of real speech signals under real and simulated recording conditions. An application of the proposed glottal source parameterisation method and glottal closing instant detection algorithm is a system which can analyse and re-synthesise voiced speech signals. This thesis describes perceptual experiments which show that, iunder linear and nonlinear recording conditions, the system produces synthetic speech which is generally preferred to speech synthesised based upon a state-of-the-art timedomain- based parameterisation technique. In sum, this work represents a movement towards flexible and robust voice-source analysis, with potential for a wide range of applications including speech analysis, modification and synthesis

    Phonological reduction and intelligibility in task-oriented dialogue

    Get PDF

    Vowel normalisation : an interface between acoustic and linguistic descriptions of speaker characteristics in Australian English

    No full text
    This thesis examines existing normalisation procedures against the background of a theoretical model of inter-speaker formant variability, which describes observed formant differences in three major categories: phonetic variation, non-uniform variation, and uniform variation. A new normalisation strategy based on this model is proposed which involves the removal of uniform and non-uniform components of inter-speaker variation in order to isolate phonetic variation. The nature of this nonuniformity is subject to empirical investigation. Working along the above strategy, the method adopted in this thesis is to initially acquire a phonetically stable vowel database, which is then screened for phonetic variations through a rigorous phonetic control procedure. The resulting data, now considered to be phonetically homogeneous, are used for exploring two essential domains of inter-speaker variability that contribute to the designing of a future normalisation procedure: (1) By applying uniform transformations using a variety of published scaling parameters, the most effective uniform scaling parameters are identified. (2) Non-uniform inter-speaker variation patterns are analysed and compared with the published results of Fant (1975). A major discovery is that non-uniform inter-speaker variation patterns obtained from phonetically controlled data are grossly different from those observed by Fant. The present database comprises 594 vowels in the /h_d/ word context (11 phonemic monophthongs x 9 speakers x 6 repetitions), and the speakers include 4 adult females, 3 adult males and 2 children (male)
    corecore