26 research outputs found

    Distributing Recognition in Computational Paralinguistics

    Get PDF

    Advanced signal processing techniques for pitch synchronous sinusoidal speech coders

    Get PDF
    Recent trends in commercial and consumer demand have led to the increasing use of multimedia applications in mobile and Internet telephony. Although audio, video and data communications are becoming more prevalent, a major application is and will remain the transmission of speech. Speech coding techniques suited to these new trends must be developed, not only to provide high quality speech communication but also to minimise the required bandwidth for speech, so as to maximise that available for the new audio, video and data services. The majority of current speech coders employed in mobile and Internet applications employ a Code Excited Linear Prediction (CELP) model. These coders attempt to reproduce the input speech signal and can produce high quality synthetic speech at bit rates above 8 kbps. Sinusoidal speech coders tend to dominate at rates below 6 kbps but due to limitations in the sinusoidal speech coding model, their synthetic speech quality cannot be significantly improved even if their bit rate is increased. Recent developments have seen the emergence and application of Pitch Synchronous (PS) speech coding techniques to these coders in order to remove the limitations of the sinusoidal speech coding model. The aim of the research presented in this thesis is to investigate and eliminate the factors that limit the quality of the synthetic speech produced by PS sinusoidal coders. In order to achieve this innovative signal processing techniques have been developed. New parameter analysis and quantisation techniques have been produced which overcome many of the problems associated with applying PS techniques to sinusoidal coders. In sinusoidal based coders, two of the most important elements are the correct formulation of pitch and voicing values from the' input speech. The techniques introduced here have greatly improved these calculations resulting in a higher quality PS sinusoidal speech coder than was previously available. A new quantisation method which is able to reduce the distortion from quantising speech spectral information has also been developed. When these new techniques are utilised they effectively raise the synthetic speech quality of sinusoidal coders to a level comparable to that produced by CELP based schemes, making PS sinusoidal coders a promising alternative at low to medium bit rates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Investigation of the impact of high frequency transmitted speech on speaker recognition

    Get PDF
    Thesis (MScEng)--Stellenbosch University, 2002.Some digitised pages may appear illegible due to the condition of the original hard copy.ENGLISH ABSTRACT: Speaker recognition systems have evolved to a point where near perfect performance can be obtained under ideal conditions, even if the system must distinguish between a large number of speakers. Under adverse conditions, such as when high noise levels are present or when the transmission channel deforms the speech, the performance is often less than satisfying. This project investigated the performance of a popular speaker recognition system, that use Gaussian mixture models, on speech transmitted over a high frequency channel. Initial experiments demonstrated very unsatisfactory results for the base line system. We investigated a number of robust techniques. We implemented and applied some of them in an attempt to improve the performance of the speaker recognition systems. The techniques we tested showed only slight improvements. We also investigates the effects of a high frequency channel and single sideband modulation on the speech features of speech processing systems. The effects that can deform the features, and therefore reduce the performance of speech systems, were identified. One of the effects that can greatly affect the performance of a speech processing system is noise. We investigated some speech enhancement techniques and as a result we developed a new statistical based speech enhancement technique that employs hidden Markov models to represent the clean speech process.AFRIKAANSE OPSOMMING: Sprekerherkenning-stelsels het 'n punt bereik waar nabyaan perfekte resultate verwag kan word onder ideale kondisies, selfs al moet die stelsel tussen 'n groot aantal sprekers onderskei. Wanneer nie-ideale kondisies, soos byvoorbeeld hoë ruisvlakke of 'n transmissie kanaal wat die spraak vervorm, teenwoordig is, is die resultate gewoonlik nie bevredigend nie. Die projek ondersoek die werksverrigting van 'n gewilde sprekerherkenning-stelsel, wat gebruik maak van Gaussiese mengselmodelle, op spraak wat oor 'n hoë frekwensie transmissie kanaal gestuur is. Aanvanklike eksperimente wat gebruik maak van 'n basiese stelsel het nie goeie resultate opgelewer nie. Ons het 'n aantal robuuste tegnieke ondersoek en 'n paar van hulle geïmplementeer en getoets in 'n poging om die resultate van die sprekerherkenning-stelsel te verbeter. Die tegnieke wat ons getoets het, het net geringe verbetering getoon. Die studie het ook die effekte wat die hoë-frekwensie kanaal en enkel-syband modulasie op spraak kenmerkvektore, ondersoek. Die effekte wat die spraak kenmerkvektore kan vervorm en dus die werkverrigting van spraak stelsels kan verlaag, is geïdentifiseer. Een van die effekte wat 'n groot invloed op die werkverrigting van spraakstelsels het, is ruis. Ons het spraak verbeterings metodes ondersoek en dit het gelei tot die ontwikkeling van 'n statisties gebaseerde spraak verbeteringstegniek wat gebruik maak van verskuilde Markov modelle om die skoon spraakproses voor te stel

    Evaluation of glottal characteristics for speaker identification.

    Get PDF
    Based on the assumption that the physical characteristics of people's vocal apparatus cause their voices to have distinctive characteristics, this thesis reports on investigations into the use of the long-term average glottal response for speaker identification. The long-term average glottal response is a new feature that is obtained by overlaying successive vocal tract responses within an utterance. The way in which the long-term average glottal response varies with accent and gender is examined using a population of 352 American English speakers from eight different accent regions. Descriptors are defined that characterize the shape of the long-term average glottal response. Factor analysis of the descriptors of the long-term average glottal responses shows that the most important factor contains significant contributions from descriptors comprised of the coefficients of cubics fitted to the long-term average glottal response. Discriminant analysis demonstrates that the long-term average glottal response is potentially useful for classifying speakers according to their gender, but is not useful for distinguishing American accents. The identification accuracy of the long-term average glottal response is compared with that obtained from vocal tract features. Identification experiments are performed using a speaker database containing utterances from twenty speakers of the digits zero to nine. Vocal tract features, which consist of cepstral coefficients, partial correlation coefficients and linear prediction coefficients, are shown to be more accurate than the long-term average glottal response. Despite analysis of the training data indicating that the long-term average glottal response was uncorrelated with the vocal tract features, various feature combinations gave insignificant improvements in identification accuracy. The effect of noise and distortion on speaker identification is examined for each of the features. It is found that the identification performance of the long-term average glottal response is insensitive to noise compared with cepstral coefficients, partial correlation coefficients and the long-term average spectrum, but that it is highly sensitive to variations in the phase response of the speech transmission channel. Before reporting on the identification experiments, the thesis introduces speech production, speech models and background to the various features used in the experiments. Investigations into the long-term average glottal response demonstrate that it approximates the glottal pulse convolved with the long-term average impulse response, and this relationship is verified using synthetic speech. Furthermore, the spectrum of the long-term average glottal response extracted from pre-emphasized speech is shown to be similar to the long-term average spectrum of pre-emphasized speech, but computationally much simpler

    Biometric Systems

    Get PDF
    Because of the accelerating progress in biometrics research and the latest nation-state threats to security, this book's publication is not only timely but also much needed. This volume contains seventeen peer-reviewed chapters reporting the state of the art in biometrics research: security issues, signature verification, fingerprint identification, wrist vascular biometrics, ear detection, face detection and identification (including a new survey of face recognition), person re-identification, electrocardiogram (ECT) recognition, and several multi-modal systems. This book will be a valuable resource for graduate students, engineers, and researchers interested in understanding and investigating this important field of study

    Algorithms for data mining

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 81-89).Data of massive size are now available in a wide variety of fields and come with great promise. In theory, these massive data sets allow data mining and exploration on a scale previously unimaginable. However, in practice, it can be difficult to apply classic data mining techniques to such massive data sets due to their sheer size. In this thesis, we study three algorithmic problems in data mining with consideration to the analysis of massive data sets. Our work is both theoretical and experimental - we design algorithms and prove guarantees for their performance and also give experimental results on real data sets. The three problems we study are: 1) finding a matrix of low rank that approximates a given matrix, 2) clustering high-dimensional points into subsets whose points lie in the same subspace, and 3) clustering objects by pairwise similarities/distances.by Grant J. Wang.Ph.D

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
    corecore