Search CORE

70 research outputs found

A comparison of features for large population speaker identification

Author: Baloyi Norman Tinyiko
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2000
Field of study

Bibliography: leaves 95-104.Speech recognition systems all have one criterion in common; they perform better in a controlled environment using clean speech. Though performance can be excellent, even exceeding human capabilities for clean speech, systems fail when presented with speech data from more realistic environments such as telephone channels. The differences using a recognizer in clean and noisy environments are extreme, and this causes one of the major obstacles in producing commercial recognition systems to be used in normal environments. It is the lack of performance of speaker recognition systems with telephone channels that this work addresses. The human auditory system is a speech recognizer with excellent performance, especially in noisy environments. Since humans perform well at ignoring noise more than any machine, auditory-based methods are the promising approaches since they attempt to model the working of the human auditory system. These methods have been shown to outperform more conventional signal processing schemes for speech recognition, speech coding, word-recognition and phone classification tasks. Since speaker identification has received lot of attention in speech processing because of its waiting real-world applications, it is attractive to evaluate the performance using auditory models as features. Firstly, this study rums at improving the results for speaker identification. The improvements were made through the use of parameterized feature-sets together with the application of cepstral mean removal for channel equalization. The study is further extended to compare an auditory-based model, the Ensemble Interval Histogram, with mel-scale features, which was shown to perform almost error-free in clean speech. The previous studies of Elli to be more robust to noise were conducted on speaker dependent, small population, isolated words and now are extended to speaker independent, larger population, continuous speech. This study investigates whether the Elli representation is more resistant to telephone noise than mel-cepstrum as was shown in the previous studies, when now for the first time, it is applied for speaker identification task using the state-of-the-art Gaussian mixture model system

Cape Town University OpenUCT

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Proceedings: Voice Technology for Interactive Real-Time Command/Control Systems Application

Author: Breaux Robert
Curran P. Mike
Huff Edward M.
Publication venue
Publication date
Field of study

Speech understanding among researchers and managers, current developments in voice technology, and an exchange of information concerning government voice technology efforts are discussed

NASA Technical Reports Server

Communication aids for the vocally handicapped using voice synthesis technology, an LCD text display and a single-chip microcomputer

Author: Rolander Clas I.
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1984
Field of study

Digital Repository @ Iowa State University (ISU)

Characterization of Speakers for Improved Automatic Speech Recognition

Author: Lincoln Michael
Publication venue: School of Information Systems. University of East Anglia
Publication date: 01/01/1999
Field of study

Automatic speech recognition technology is becoming increasingly widespread in many applications. For dictation tasks, where a single talker is to use the system for long periods of time, the high recognition accuracies obtained are in part due to the user performing a lengthy enrolment procedure to ‘tune’ the parameters of the recogniser to their particular voice characteristics and speaking style. Interactive speech systems, where the speaker is using the system for only a short period of time (for example to obtain information) do not have the luxury of long enrolments and have to adapt rapidly to new speakers and speaking styles. This thesis discusses the variations between speakers and speaking styles which result in decreased recognition performance when there is a mismatch between the talker and the systems models. An unsupervised method to rapidly identify and normalise differences in vocal tract length is presented and shown to give improvements in recognition accuracy for little computational overhead. Two unsupervised methods of identifying speakers with similar speaking styles are also presented. The first, a data-driven technique, is shown to accurately classify British and American accented speech, and is also used to improve recognition accuracy by clustering groups of similar talkers. The second uses the phonotactic information available within pronunciation dictionaries to model British and American accented speech. This model is then used to rapidly and accurately classify speakers

CiteSeerX

Edinburgh Research Archive

OpenGrey Repository

Underwater noise due to precipitation

Author: Crum Lawrence A.
Jensen Leif Bjørnø
Prosperetti Andrea
Pumphrey Hugh C.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1989
Field of study

Crossref

Online Research Database In Technology

Hierachical methods for large population speaker identification using telephone speech

Author: Lerato Lerato
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

Cape Town University OpenUCT

Proceedings of Nordic Acoustical Meeting, NAM '86, Aalborg, Denmark, August 20-22, 1986

Author
Publication venue: Aalborg University
Publication date: 01/01/1986
Field of study

VBN

Theoretical and experimental investigation of the insertion loss of a dissipative muffler

Author: Pommer Christian
Tarnow Viggo
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1986
Field of study

Crossref

Online Research Database In Technology

Automatic syllable detection for vowel landmarks

Author
Publication venue
Publication date: 01/01/2000
Field of study

Supervised by Kenneth N. Stevens.Also issued as Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 192-200).by Andrew Wilson Howitt

DSpace@MIT