746 research outputs found

    Robust speaker identification using artificial neural networks

    Full text link
    This research mainly focuses on recognizing the speakers through their speech samples. Numerous Text-Dependent or Text-Independent algorithms have been developed by people so far, to recognize the speaker from his/her speech. In this thesis, we concentrate on the recognition of the speaker from the fixed text i.e. Text-Dependent . Possibility of extending this method to variable text i.e. Text-Independent is also analyzed. Different feature extraction algorithms are employed and their performance with Artificial Neural Networks as a Data Classifier on a fixed training set is analyzed. We find a way to combine all these individual feature extraction algorithms by incorporating their interdependence. The efficiency of these algorithms is determined after the input speech is classified using Back Propagation Algorithm of Artificial Neural Networks. A special case of Back Propagation Algorithm which improves the efficiency of the classification is also discussed

    Virtual Exploration of the Human Vocal Tract

    Get PDF
    Hypothesis: By simulating the acoustic field throughout the entire vocal tract the evolution of speech sounds within the tract can be directly and quantitatively related to physical variations in the tract geometry. This insight into speech production could then be applied to a variety of fields where the ability to alter or investigate speech characteristics in a targeted way could be useful for example in the teaching of speech science, in speech coaching, or as part of the planning of medical procedures. In this research, a bespoke acoustic simulation package has been produced using a continuous 3-dimensional Digital Waveguide Mesh (DWM) which can produce acoustic output throughout the entire simulation domain containing the tract at every time step. This package has been shown to reproduce formant frequencies for a variety of vocal tract shapes with an average mean absolute error of 10.12% at the lips, which is comparable to other research. These results have been investigated by comparing simulation output to recorded output from physical models. This simulation package has also been used to perform studies into the shifting of formant frequencies during speech sound production along the length of the tract, and into the effect on formant frequencies of the removal of geometric features of the tract such as the piriform fossae. These studies have been compared to physical internal measurements of vocal tract models from living subjects, showing preliminary agreement with further development required. A large emphasis has been placed on the accessibility of this research, with the production of several tools for visualisation of the data contained within, and with decisions made during the production of the simulation package itself

    Estimating Housing Demand with an Application to Explaining Racial Segregation in Cities

    Get PDF
    We present a three-stage estimation procedure to recover willingness to pay for housing attributes. In the first stage, we estimate a non-parametric hedonic home price function. Second, we recover each consumer's taste parameters for product characteristics using first order conditions for utility maximization. Finally, we estimate the distribution of household tastes as a function of household demographics. As an application of our methods, we compare alternative explanations for why blacks choose to live in center cities while whites suburbanize.

    Modelling the effects of speech rate variation for automatic speech recognition

    Get PDF
    Wrede B. Modelling the effects of speech rate variation for automatic speech recognition. Bielefeld (Germany): Bielefeld University; 2002.In automatic speech recognition it is a widely observed phenomenon that variations in speech rate cause severe degradations of the speech recognition performance. This is due to the fact that standard stochastic based speech recognition systems specialise on average speech rate. Although many approaches to modelling speech rate variation have been made, an integrated approach in a substantial system still has be to developed. General approaches to rate modelling are based on rate dependent models which are trained with rate specific subsets of the training data. During decoding a signal based rate estimation is performed according to which the set of rate dependent models is selected. While such approaches are able to reduce the word error rate significantly, they suffer from shortcomings such as the reduction of training data and the expensive training and decoding procedure. However, phonetic investigations show that there is a systematic relationship between speech rate and the acoustic characteristics of speech. In fast speech a tendency of reduction can be observed which can be described in more detail as a centralisation effect and an increase in coarticulation. Centralisation means that the formant frequencies of vowels tend to shift towards the vowel space center while increased coarticulation denotes the tendency of the spectral features of a vowel to shift towards those of its phonemic neighbour. The goal of this work is to investigate the possibility to incorporate the knowledge of the systematic nature of the influence of speech rate variation on the acoustic features in speech rate modelling. In an acoustic-phonetic analysis of a large corpus of spontaneous speech it was shown that an increased degree of the two effects of centralisation and coarticulation can be found in fast speech. Several measures for these effects were developed and used in speech recognition experiments with rate dependent models. A thorough investigation of rate dependent models showed that with duration and coarticulation based measures significant increases of the performance could be achieved. It was shown that by the use of different measures the models were adapted either to centralisation or coarticulation. Further experiments showed that by a more detailed modelling with more rate classes a further improvement can be achieved. It was also observed that a general basis for the models is needed before rate adaptation can be performed. In a comparison to other sources of acoustic variation it was shown that the effects of speech rate are as severe as those of speaker variation and environmental noise. All these results show that for a more substantial system that models rate variations accurately it is necessary to focus on both, durational and spectral effects. The systematic nature of the effects indicates that a continuous modelling is possible

    THE SPECTRAL IMPACT OF THE HYPOPHARYNGEAL CAVITIES ON THE SINGING VOICE

    Get PDF
    corecore