Search CORE

22 research outputs found

Speaker recognition using frequency filtered spectral energies

Author: Hernando Pericás Francisco Javier
Publication venue: FONDAZIONE UGO BORDONI
Publication date: 01/01/1999
Field of study

The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a simple first or second order FIR filter have proved to be an efficient speech representation in terms of both speech recognition rate and computational load. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using melcepstrum have been obtained on the TIMIT database, especially when white noise was added. On the other hand, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering has been explored for speaker verification. The combination of hybrid spectral analysis and frequency filtering, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POLYCOST database.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Improving the robustness of the usual fbe-based asr front-end

Author: Hernando Pericás Francisco Javier
Macho D
Nadeu Camprubí Climent
Publication venue: Mergablum
Publication date: 01/01/2000
Field of study

All speech recognition systems require some form of signal representation that parametrically models the temporal evolution of the spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those filterbank energies (FBE) always includes smoothing of basic spectral measurements and non-linear amplitude compression. A variety of linear transformations are typically applied to this time-frequency representation prior to the Hidden Markov Model (HMM) pattern-matching stage of recognition. In the paper, we will discuss some robustness issues involved in both the computation of the FBEs and the posterior linear transformations, presenting alternative techniques that can improve robustness in additive noise conditions. In particular, the root non-linearity, a voicing-dependent FBE computation technique and a time&frequency filtering (tiffing) technique will be considered. Recognition results for the Aurora database will be shown to illustrate the potential application of these alternatives techniques for enhancing the robustness of speech recognition systems.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

DWT and LPC based feature extraction methods for isolated word recognition

Author: AE Rosenberg
B Kotnik
DS Pallett
F Itakura
H Hermansky
H Hermansky
J Xu
JN Gowdy
K Wang
KP Soman
L Rabiner
M Gupta
M Krishnan
MJF Gales
Navnath S Nehe
NS Nehe
O Farooq
O Farooq
Raghunath S Holambe
S Mallat
SB Davis
SF Boll
Y Hao
Z Tufekci
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Wavelet-based techniques for speech recognition

Author: Omar Farooq (7204418)
Publication venue
Publication date: 01/01/2002
Field of study

In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.

Loughborough University Institutional Repository

Practical Issues of Building Robust HMM Models Using HTK and SPHINX Systems

Author: Gregor Rozinaj
Juraj Kacur
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Reconocimiento del locutor mediante filtrado frecuencial de energías espectrales estimadas por métodos híbridos

Author: Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Publication venue: 'Universidad Politecnica de Madrid - University Library'
Publication date: 01/01/2000
Field of study

Se han explorado dos formas de obtener parámetros más robustos para reconocimiento del locutor: la hibridación de técnicas de análisis espectral y el filtrado frecuencial de las energías de las bandas. Se ha comprobado que el filtrado frecuencial constituye una representación eficiente en reconocimiento del habla y puede ecualizar aproximadamente la varianza cepstral, realzando las oscilaciones espectrales más efectivas para la discriminación entre locutores. Se han obtenido buenos resultados de identificación sobre la base de datos TIMIT, especialmente cuando se ha añadido ruido blanco. Por otro lado, se ha explorado la hibridación de la predicción lineal y el banco de filtros en la etapa de análisis espectral. La combinación de estas técnicas ha proporcionado buenos resultados de verificación sobre la base de datos telefónica POLYCOST.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Speech recognition in noise using weighted matching algorithms

Author: Becerra Yoma Nestor
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

Edinburgh Research Archive

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Speech recognition on DSP: algorithm optimization and performance analysis.

Author
Publication venue
Publication date: 01/01/2004
Field of study

Yuan Meng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 85-91).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- History of ASR development --- p.2Chapter 1.2 --- Fundamentals of automatic speech recognition --- p.3Chapter 1.2.1 --- Classification of ASR systems --- p.3Chapter 1.2.2 --- Automatic speech recognition process --- p.4Chapter 1.3 --- Performance measurements of ASR --- p.7Chapter 1.3.1 --- Recognition accuracy --- p.7Chapter 1.3.2 --- Complexity --- p.7Chapter 1.3.3 --- Robustness --- p.8Chapter 1.4 --- Motivation and goal of this work --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Signal processing techniques for front-end --- p.12Chapter 2.1 --- Basic feature extraction principles --- p.13Chapter 2.1.1 --- Pre-emphasis --- p.13Chapter 2.1.2 --- Frame blocking and windowing --- p.13Chapter 2.1.3 --- Discrete Fourier Transform (DFT) computation --- p.15Chapter 2.1.4 --- Spectral magnitudes --- p.15Chapter 2.1.5 --- Mel-frequency filterbank --- p.16Chapter 2.1.6 --- Logarithm of filter energies --- p.18Chapter 2.1.7 --- Discrete Cosine Transformation (DCT) --- p.18Chapter 2.1.8 --- Cepstral Weighting --- p.19Chapter 2.1.9 --- Dynamic featuring --- p.19Chapter 2.2 --- Practical issues --- p.20Chapter 2.2.1 --- Review of practical problems and solutions in ASR appli- cations --- p.20Chapter 2.2.2 --- Model of environment --- p.23Chapter 2.2.3 --- End-point detection (EPD) --- p.23Chapter 2.2.4 --- Spectral subtraction (SS) --- p.25Chapter 3 --- HMM-based Acoustic Modeling --- p.26Chapter 3.1 --- HMMs for ASR --- p.26Chapter 3.2 --- Output probabilities --- p.27Chapter 3.3 --- Viterbi search engine --- p.29Chapter 3.4 --- Isolated word recognition (IWR) & Connected word recognition (CWR) --- p.30Chapter 3.4.1 --- Isolated word recognition --- p.30Chapter 3.4.2 --- Connected word recognition (CWR) --- p.31Chapter 4 --- DSP for embedded applications --- p.32Chapter 4.1 --- "Classification of embedded systems (DSP, ASIC, FPGA, etc.)" --- p.32Chapter 4.2 --- Description of hardware platform --- p.34Chapter 4.3 --- I/O operation for real-time processing --- p.36Chapter 4.4 --- Fixed point algorithm on DSP --- p.40Chapter 5 --- ASR algorithm optimization --- p.42Chapter 5.1 --- Methodology --- p.42Chapter 5.2 --- Floating-point to fixed-point conversion --- p.43Chapter 5.3 --- Computational complexity consideration --- p.45Chapter 5.3.1 --- Feature extraction techniques --- p.45Chapter 5.3.2 --- Viterbi search module --- p.50Chapter 5.4 --- Memory requirements consideration --- p.51Chapter 6 --- Experimental results and performance analysis --- p.53Chapter 6.1 --- Cantonese isolated word recognition (IWR) --- p.54Chapter 6.1.1 --- Execution time --- p.54Chapter 6.1.2 --- Memory requirements --- p.57Chapter 6.1.3 --- Recognition performance --- p.57Chapter 6.2 --- Connected word recognition (CWR) --- p.61Chapter 6.2.1 --- Execution time consideration --- p.62Chapter 6.2.2 --- Recognition performance --- p.62Chapter 6.3 --- Summary & discussion --- p.66Chapter 7 --- Implementation of practical techniques --- p.67Chapter 7.1 --- End-point detection (EPD) --- p.67Chapter 7.2 --- Spectral subtraction (SS) --- p.71Chapter 7.3 --- Experimental results --- p.72Chapter 7.3.1 --- Isolated word recognition (IWR) --- p.72Chapter 7.3.2 --- Connected word recognition (CWR) --- p.75Chapter 7.4 --- Results --- p.77Chapter 8 --- Conclusions and future work --- p.78Chapter 8.1 --- Summary and Conclusions --- p.78Chapter 8.2 --- Suggestions for future research --- p.80Appendices --- p.82Chapter A --- "Interpolation of data entries without floating point, divides or conditional branches" --- p.82Chapter B --- Vocabulary for Cantonese isolated word recognition task --- p.84Bibliography --- p.8

CUHK Digital Repository