Search CORE

66 research outputs found

Reconeixement dels dígits catalans utilitzant models de Markov continus

Author: Garrigosa Rivas Sara
Moreno Bilbao M. Asunción
Publication venue: Branca d'Estudiants de l'IEEE de Barcelona
Publication date: 01/01/1993
Field of study

Peer Reviewe

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Revistes Catalanes amb Accés Obert

Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition

Author: Hernando Pericás Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

Speech dynamic features are routinely used in current speech recognition systems in combination with short-term (static) spectral features. Although many existing speech recognition systems do not weight both kinds of features, it seems convenient to use some weighting in order to increase the recognition accuracy of the system. In the cases that this weighting is performed, it is manually tuned or it consists simply in compensating the variances. The aim of this paper is to propose a method to automatically estimate an optimum state-dependent stream weighting in a continuous density hidden Markov model (CDHMM) recognition system by means of a maximum-likelihood based training algorithm. Unlike other works, it is shown that simple constraints on the new weighting parameters permit to apply the maximum-likelihood criterion to this problem. Experimental results in speaker independent digit recognition show an important increase of recognition accuracy.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Speech Recognition Using Vector Quantization through Modified K-meansLBG Algorithm

Author: Doye D. D.
Sonkamble Balwant A.
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/07/2012
Field of study

In the Vector Quantization, the main task is to generate a good codebook. The distortion measure between the original pattern and the reconstructed pattern should be minimum. In this paper, a proposed algorithm called Modified K-meansLBG algorithm used to obtain a good codebook. The system has shown good performance on limited vocabulary tasks. Keywords: K-means algorithm, LBG algorithm, Vector Quantization, Speech Recognitio

International Institute for Science, Technology and Education (IISTE): E-Journals

Improving the robustness of the usual fbe-based asr front-end

Author: Hernando Pericás Francisco Javier
Macho D
Nadeu Camprubí Climent
Publication venue: Mergablum
Publication date: 01/01/2000
Field of study

All speech recognition systems require some form of signal representation that parametrically models the temporal evolution of the spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those filterbank energies (FBE) always includes smoothing of basic spectral measurements and non-linear amplitude compression. A variety of linear transformations are typically applied to this time-frequency representation prior to the Hidden Markov Model (HMM) pattern-matching stage of recognition. In the paper, we will discuss some robustness issues involved in both the computation of the FBEs and the posterior linear transformations, presenting alternative techniques that can improve robustness in additive noise conditions. In particular, the root non-linearity, a voicing-dependent FBE computation technique and a time&frequency filtering (tiffing) technique will be considered. Recognition results for the Aurora database will be shown to illustrate the potential application of these alternatives techniques for enhancing the robustness of speech recognition systems.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Recognition of numbers by using demisyllables and hidden Markov models

Author: Bonafonte Cávez Antonio
Lleida Solano Eduardo
Mariño Acebal José Bernardo
Monte Moreno Enrique
Moreno Bilbao M. Asunción
Nadeu Camprubí Climent
Publication venue: 'Elsevier BV'
Publication date: 01/01/1990
Field of study

Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Using RASTA in task independent TANDEM feature extraction

Author: Aradilla Guillermo
Dines John
Sivadas Sunil
Publication venue: Martigny, Switzerland
Publication date: 10/03/2006
Field of study

In this work, we investigate the use of RASTA filter in the TANDEM feature extraction method when trained with a task independent data. RASTA filter removes the linear distortion introduced by the communication channel which is demonstrated in a 18\% relative improvement on the Numbers 95 task. Also, studies yielded a relative improvement of 35\% over the basic PLP features by combining TANDEM features and conventional PLP features

Infoscience - École polytechnique fédérale de Lausanne

The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevant-population sample recordings

Author: Morrison Geoffrey Stewart
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

In a 2017 New South Wales case, a forensic practitioner conducted a forensic voice comparison using a Gaussian mixture model – universal background model (GMM-UBM). The practitioner did not report the results of empirical tests of the performance of this system under conditions reflecting those of the case under investigation. The practitioner trained the model for the numerator of the likelihood ratio using the known-speaker recording, but trained the model for the denominator of the likelihood ratio (the UBM) using high-quality audio recordings, not recordings which reflected the conditions of the known-speaker recording. There was therefore a difference in the mismatch between the numerator model and the questioned-speaker recording versus the mismatch between the denominator model and the questioned-speaker recording. In addition, the practitioner did not calibrate the output of the system. The present paper empirically tests the performance of a replication of the practitioner’s system. It also tests a system in which the UBM was trained on known-speaker-condition data and which was empirically calibrated. The performance of the former system was very poor, and the performance of the latter was substantially better

Crossref

Aston Publications Explorer

Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

Author: Bouvrie Jake
Chikkerur Sharat
Ezzat Tony
Kouh Minjoon
Poggio Tomaso
Rifkin Ryan
Schutte Ken
Publication venue
Publication date: 01/01/2007
Field of study

A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis

CiteSeerX

DSpace@MIT