Search CORE

713 research outputs found

A Comparison of Front-Ends for Bitstream-Based ASR over IP

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
Gómez Cajas D. F.
Peláez Moreno Carmen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Automatic speech recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurrence of packet losses. In this framework, bitstream-based approaches that obtain the ASR feature vectors directly from the coded bitstream, avoiding the speech decoding process, have been proposed ([S.H. Choi, H.K. Kim, H.S. Lee, Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Commun. 30 (4) (2000) 223–233. A. Gallardo-Antolín, C. Pelàez-Moreno, F. Díaz-de-María, Recognizing GSM digital speech, IEEE Trans. Speech Audio Process., to appear. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Trans. Speech Audio Process. 10 (8) (2002) 591–604. C. Peláez-Moreno, A. Gallardo-Antolín, F. Díaz-de-María, Recognizing voice over IP networks: a robust front-end for speech recognition on the WWW, IEEE Trans. Multimedia 3(2) (2001) 209–218], among others) to improve the robustness of ASR systems. LSP (Line Spectral Pairs) are the preferred set of parameters for the description of the speech spectral envelope in most of the modern speech coders. Nevertheless, LSP have proved to be unsuitable for ASR, and they must be transformed into cepstrum-type parameters. In this paper we comparatively evaluate the robustness of the most significant LSP to cepstrum transformations in a simulated VoIP (voice over IP) environment which includes two of the most popular codecs used in that network (G.723.1 and G.729) and several network conditions. In particular, we compare ‘pseudocepstrum’ [H.K. Kim, S.H. Choi, H.S. Lee, On approximating Line Spectral Frequencies to LPC cepstral coefficients, IEEE Trans. Speech Audio Process. 8 (2) (2000) 195–199], an approximated but straightforward transformation of LSP into LP cepstral coefficients, with a more computationally demanding but exact one. Our results show that pseudocepstrum is preferable when network conditions are good or computational resources low, while the exact procedure is recommended when network conditions become more adverse.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Visual identification by signature tracking

Author: Munich Mario E.
Perona Pietro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2003
Field of study

We propose a new camera-based biometric: visual signature identification. We discuss the importance of the parameterization of the signatures in order to achieve good classification results, independently of variations in the position of the camera with respect to the writing surface. We show that affine arc-length parameterization performs better than conventional time and Euclidean arc-length ones. We find that the system verification performance is better than 4 percent error on skilled forgeries and 1 percent error on random forgeries, and that its recognition performance is better than 1 percent error rate, comparable to the best camera-based biometrics

Caltech Authors

Speech Recognition Under Noise Conditions: Compensation Methods

Author: Angel de la Torre
Antonio J. Rubio
Carmen Benitez
Javier Ramirez Luz Garcia
Jose C. Segura
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

Identifying hidden contexts

Author: Zliobaite Indre
Publication venue: Springer LNAI
Publication date: 24/05/2011
Field of study

In this study we investigate how to identify hidden contexts from the data in classification tasks. Contexts are artifacts in the data, which do not predict the class label directly. For instance, in speech recognition task speakers might have different accents, which do not directly discriminate between the spoken words. Identifying hidden contexts is considered as data preprocessing task, which can help to build more accurate classifiers, tailored for particular contexts and give an insight into the data structure. We present three techniques to identify hidden contexts, which hide class label information from the input data and partition it using clustering techniques. We form a collection of performance measures to ensure that the resulting contexts are valid. We evaluate the performance of the proposed techniques on thirty real datasets. We present a case study illustrating how the identified contexts can be used to build specialized more accurate classifiers

Bournemouth University Research Online

Harmonic Change Detection from Musical Audio

Author: Bernardes de Almeida Gilberto
Ramoneda Franco Pedro
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/1997
Field of study

In this dissertation, we advance an enhanced method for computing Harte et al.’s [31] Harmonic Change Detection Function (HCDF). HCDF aims to detect harmonic transitions in musical audio signals. HCDF is crucial both for the chord recognition in Music Information Retrieval (MIR) and a wide range of creative applications. In light of recent advances in harmonic description and transformation, we depart from the original architecture of Harte et al.’s HCDF, to revisit each one of its component blocks, which are evaluated using an exhaustive grid search aimed to identify optimal parameters across four large style-specific musical datasets. Our results show that the newly proposed methods and parameter optimization improve the detection of harmonic changes, by 5.57% (f-score) with respect to previous methods. Furthermore, while guaranteeing recall values at > 99%, our method improves precision by 6.28%. Aiming to leverage novel strategies for real-time harmonic-content audio processing, the optimized HCDF is made available for Javascript and the MAX and Pure Data multimedia programming environments. Moreover, all the data as well as the Python code used to generate them, are made available.<br /

Repositorio Universidad de Zaragoza

Knowledge Resources in Automatic Speech Recognition and Understanding for Romanian Language

Author: Corneliu Octavian Dumitru
Diana Mihaela Militaru
Inge Gavat
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Noise-robust speaker recognition using reduced Multiconditional Gaussian Mixture models

Author: Berger Pedro de Azevedo
D’Almeida Frederico Quadros
Nascimento Francisco Assis de Oliveira
Silva Lúcio Martins da
Publication venue: 'ABEAT - Associacao Brasileira de Especialistas em Alta Tecnologia'
Publication date: 01/01/2008
Field of study

Multiconditional Modeling is widely used to create noise-robust speaker recognition systems. However, the approach is computationally intensive. An alternative is to optimize the training condition set in order to achieve maximum noise robustness while using the smallest possible number of noise conditions during training. This paper establishes the optimal conditions for a noise-robust training model by considering audio material at different sampling rates and with different coding methods. Our results demonstrate that using approximately four training noise conditions is sufficient to guarantee robust models in the 60 dB to 10 dB Signal-to-Noise Ratio (SNR) range

Repositório Institucional da Universidade de Brasília