Search CORE

8 research outputs found

Hierachical methods for large population speaker identification using telephone speech

Author: Lerato Lerato
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

Cape Town University OpenUCT

Open-set Speaker Identification

Author: Karadaghi Rawande
Publication venue
Publication date: 20/12/2018
Field of study

This study is motivated by the growing need for effective extraction of intelligence and evidence from audio recordings in the fight against crime, a need made ever more apparent with the recent expansion of criminal and terrorist organisations. The main focus is to enhance open-set speaker identification process within the speaker identification systems, which are affected by noisy audio data obtained under uncontrolled environments such as in the street, in restaurants or other places of businesses. Consequently, two investigations are initially carried out including the effects of environmental noise on the accuracy of open-set speaker recognition, which thoroughly cover relevant conditions in the considered application areas, such as variable training data length, background noise and real world noise, and the effects of short and varied duration reference data in open-set speaker recognition. The investigations led to a novel method termed “vowel boosting” to enhance the reliability in speaker identification when operating with varied duration speech data under uncontrolled conditions. Vowels naturally contain more speaker specific information. Therefore, by emphasising this natural phenomenon in speech data, it enables better identification performance. The traditional state-of-the-art GMM-UBMs and i-vectors are used to evaluate “vowel boosting”. The proposed approach boosts the impact of the vowels on the speaker scores, which improves the recognition accuracy for the specific case of open-set identification with short and varied duration of speech material

University of Hertfordshire Research Archive

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Speech recognition on DSP: algorithm optimization and performance analysis.

Author
Publication venue
Publication date: 01/01/2004
Field of study

Yuan Meng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 85-91).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- History of ASR development --- p.2Chapter 1.2 --- Fundamentals of automatic speech recognition --- p.3Chapter 1.2.1 --- Classification of ASR systems --- p.3Chapter 1.2.2 --- Automatic speech recognition process --- p.4Chapter 1.3 --- Performance measurements of ASR --- p.7Chapter 1.3.1 --- Recognition accuracy --- p.7Chapter 1.3.2 --- Complexity --- p.7Chapter 1.3.3 --- Robustness --- p.8Chapter 1.4 --- Motivation and goal of this work --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Signal processing techniques for front-end --- p.12Chapter 2.1 --- Basic feature extraction principles --- p.13Chapter 2.1.1 --- Pre-emphasis --- p.13Chapter 2.1.2 --- Frame blocking and windowing --- p.13Chapter 2.1.3 --- Discrete Fourier Transform (DFT) computation --- p.15Chapter 2.1.4 --- Spectral magnitudes --- p.15Chapter 2.1.5 --- Mel-frequency filterbank --- p.16Chapter 2.1.6 --- Logarithm of filter energies --- p.18Chapter 2.1.7 --- Discrete Cosine Transformation (DCT) --- p.18Chapter 2.1.8 --- Cepstral Weighting --- p.19Chapter 2.1.9 --- Dynamic featuring --- p.19Chapter 2.2 --- Practical issues --- p.20Chapter 2.2.1 --- Review of practical problems and solutions in ASR appli- cations --- p.20Chapter 2.2.2 --- Model of environment --- p.23Chapter 2.2.3 --- End-point detection (EPD) --- p.23Chapter 2.2.4 --- Spectral subtraction (SS) --- p.25Chapter 3 --- HMM-based Acoustic Modeling --- p.26Chapter 3.1 --- HMMs for ASR --- p.26Chapter 3.2 --- Output probabilities --- p.27Chapter 3.3 --- Viterbi search engine --- p.29Chapter 3.4 --- Isolated word recognition (IWR) & Connected word recognition (CWR) --- p.30Chapter 3.4.1 --- Isolated word recognition --- p.30Chapter 3.4.2 --- Connected word recognition (CWR) --- p.31Chapter 4 --- DSP for embedded applications --- p.32Chapter 4.1 --- "Classification of embedded systems (DSP, ASIC, FPGA, etc.)" --- p.32Chapter 4.2 --- Description of hardware platform --- p.34Chapter 4.3 --- I/O operation for real-time processing --- p.36Chapter 4.4 --- Fixed point algorithm on DSP --- p.40Chapter 5 --- ASR algorithm optimization --- p.42Chapter 5.1 --- Methodology --- p.42Chapter 5.2 --- Floating-point to fixed-point conversion --- p.43Chapter 5.3 --- Computational complexity consideration --- p.45Chapter 5.3.1 --- Feature extraction techniques --- p.45Chapter 5.3.2 --- Viterbi search module --- p.50Chapter 5.4 --- Memory requirements consideration --- p.51Chapter 6 --- Experimental results and performance analysis --- p.53Chapter 6.1 --- Cantonese isolated word recognition (IWR) --- p.54Chapter 6.1.1 --- Execution time --- p.54Chapter 6.1.2 --- Memory requirements --- p.57Chapter 6.1.3 --- Recognition performance --- p.57Chapter 6.2 --- Connected word recognition (CWR) --- p.61Chapter 6.2.1 --- Execution time consideration --- p.62Chapter 6.2.2 --- Recognition performance --- p.62Chapter 6.3 --- Summary & discussion --- p.66Chapter 7 --- Implementation of practical techniques --- p.67Chapter 7.1 --- End-point detection (EPD) --- p.67Chapter 7.2 --- Spectral subtraction (SS) --- p.71Chapter 7.3 --- Experimental results --- p.72Chapter 7.3.1 --- Isolated word recognition (IWR) --- p.72Chapter 7.3.2 --- Connected word recognition (CWR) --- p.75Chapter 7.4 --- Results --- p.77Chapter 8 --- Conclusions and future work --- p.78Chapter 8.1 --- Summary and Conclusions --- p.78Chapter 8.2 --- Suggestions for future research --- p.80Appendices --- p.82Chapter A --- "Interpolation of data entries without floating point, divides or conditional branches" --- p.82Chapter B --- Vocabulary for Cantonese isolated word recognition task --- p.84Bibliography --- p.8

CUHK Digital Repository

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

Directory of Open Access Books (DOAB)

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

Comparison of MFCC, LPCC and PLP features for the determination of a speaker's gender

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref