41 research outputs found

    Theory, design and application of gradient adaptive lattice filters

    Get PDF
    SIGLELD:D48933/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Residual-excited linear predictive (RELP) vocoder system with TMS320C6711 DSK and vowel characterization

    Get PDF
    The area of speech recognition by machine is one of the most popular and complicated subjects in the current multimedia field. Linear predictive coding (LPC) is a useful technique for voice coding in speech analysis and synthesis. The first objective of this research was to establish a prototype of the residual-excited linear predictive (RELP) vocoder system in a real-time environment. Although its transmission rate is higher, the quality of synthesized speech of the RELP vocoder is superior to that of other vocoders. As well, it is rather simple and robust to implement. The RELP vocoder uses residual signals as excitation rather than periodic pulse or white noise. The RELP vocoder was implemented with Texas Instruments TMS320C6711 DSP starter kit (DSK) using C. Identifying vowel sounds is an important element in recognizing speech contents. The second objective of research was to explore a method of characterizing vowels by means of parameters extracted by the RELP vocoder, which was not known to have been used in speech recognition, previously. Five English vowels were chosen for the experimental sample. Utterances of individual vowel sounds and of the vowel sounds in one-syllable-words were recorded and saved as WAVE files. A large sample of 20-ms vowel segments was obtained from these utterances. The presented method utilized 20 samples of a segment's frequency response, taken equally in logarithmic scale, as a LPC frequency response vector. The average of each vowel's vectors was calculated. The Euclidian distances between the average vectors of the five vowels and an unknown vector were compared to classify the unknown vector into a certain vowel group. The results indicate that, when a vowel is uttered alone, the distance to its average vector is smaller than to the other vowels' average vectors. By examining a given vowel frequency response against all known vowels' average vectors, individually, one can determine to which vowel group the given vowel belongs. When a vowel is uttered with consonants, however, variances and covariances increase. In some cases, distinct differences may not be recognized among the distances to a vowel's own average vector and the distances to the other vowels' average vectors. Overall, the results of vowel characterization did indicate an ability of the RELP vocoder to identify and classify single vowel sounds

    Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

    Get PDF
    Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency

    Evaluation of glottal characteristics for speaker identification.

    Get PDF
    Based on the assumption that the physical characteristics of people's vocal apparatus cause their voices to have distinctive characteristics, this thesis reports on investigations into the use of the long-term average glottal response for speaker identification. The long-term average glottal response is a new feature that is obtained by overlaying successive vocal tract responses within an utterance. The way in which the long-term average glottal response varies with accent and gender is examined using a population of 352 American English speakers from eight different accent regions. Descriptors are defined that characterize the shape of the long-term average glottal response. Factor analysis of the descriptors of the long-term average glottal responses shows that the most important factor contains significant contributions from descriptors comprised of the coefficients of cubics fitted to the long-term average glottal response. Discriminant analysis demonstrates that the long-term average glottal response is potentially useful for classifying speakers according to their gender, but is not useful for distinguishing American accents. The identification accuracy of the long-term average glottal response is compared with that obtained from vocal tract features. Identification experiments are performed using a speaker database containing utterances from twenty speakers of the digits zero to nine. Vocal tract features, which consist of cepstral coefficients, partial correlation coefficients and linear prediction coefficients, are shown to be more accurate than the long-term average glottal response. Despite analysis of the training data indicating that the long-term average glottal response was uncorrelated with the vocal tract features, various feature combinations gave insignificant improvements in identification accuracy. The effect of noise and distortion on speaker identification is examined for each of the features. It is found that the identification performance of the long-term average glottal response is insensitive to noise compared with cepstral coefficients, partial correlation coefficients and the long-term average spectrum, but that it is highly sensitive to variations in the phase response of the speech transmission channel. Before reporting on the identification experiments, the thesis introduces speech production, speech models and background to the various features used in the experiments. Investigations into the long-term average glottal response demonstrate that it approximates the glottal pulse convolved with the long-term average impulse response, and this relationship is verified using synthetic speech. Furthermore, the spectrum of the long-term average glottal response extracted from pre-emphasized speech is shown to be similar to the long-term average spectrum of pre-emphasized speech, but computationally much simpler

    Efficient digital techniques for speech processing

    Get PDF
    Computationally efficient digital signal processing algorithms suited for speech signals are investigated, A new efficient time domain algorithm for estimating the pitch period of voiced speech is presented.This algorithm has no multiply operations and can be implemented in integer arithmetic without scaling on a 16-bit microprocessor, The algorithm gives a low error rate with signal to noise ratio higher than 10 dB, Moreover, a good signal intensity estimation is obtained as a by-product of the algorithm.The importance of the zero-crossing counts of a differentiated speech waveform is explored in terms of a discrete mathematical analysis.The potential of this parameter is shown by its use in a new speaker verification system, The verification score obtained using this parameter in combination with the intensity compares well with the score obtained using only the pitch period parameter. These three parameters have also been compared in terms of their ability to discriminate between speakers,The computational effort necessary to extract the zero-crossing count of differentiated speech is very small and it can be extracted using a microprocessor in. real time,An efficient way of creating reference templates using a nonlinear mapping technique to cater for intraspeaker variations is presented. Results show that the speaker verification score is improved when intraspeaker variations are considered in creating reference templates,A speaker dependent digit recognition system has been implemented using Burg’s Partial Correlation coefficients and their nonlinear transforms, The results show that the recognition score obtained is 100 per cent with three or more Burg's coefficients, and that a simple 'city block’ distance measure is adequate,Finally a new computationally efficient multiplication technique which speeds multiplication at the expense of memory space is developed

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

    Comparative Study And Analysis Of Quality Based Multibiometric Technique Using Fuzzy Inference System

    Get PDF
    Biometric is a science and technology of measuring and analyzing biological data i.e. physical or behavioral traits which is able to uniquely recognize a person from others. Prior studies of biometric verification systems with fusion of several biometric sources have been proved to be outstanding over single biometric system. However, fusion approach without considering the quality information of the data used will affect the system performance where in some cases the performances of the fusion system may become worse compared to the performances of either one of the single systems. In order to overcome this limitation, this study proposes a quality based fusion scheme by designing a fuzzy inference system (FIS) which is able to determine the optimum weight to combine the parameter for fusion systems in changing conditions. For this purpose, fusion systems which combine two modalities i.e. speech and lip traits are experimented. For speech signal, Mel Frequency Cepstral Coefficient (MFCC) is used as features while region of interest (ROI) of lip image is employed as lip features. Support vector machine (SVM) is then executed as classifier to the verification system. For validation, common fusion schemes i.e. minimum rule, maximum rule, simple sum rule, weighted sum rule are compared to the proposed quality based fusion scheme. From the experimental results at 35dB SNR of speech and 0.8 quality density of lip, the EER percentages for speech, lip, minimum rule, maximum rule, simple sum rule, weighted sum rule systems are observed as 5.9210%, 37.2157%, 33.2676%, 31.1364%, 4.0112% and 14.9023%, respectively compared to the performances of sugeno-type FIS and mamdani-type FIS i.e. 1.9974% and 1.9745%

    Nasality in automatic speaker verification

    Get PDF
    corecore