69 research outputs found
ATVS-UAM ALBAYZIN-VL08 System description
Actas de las V Jornadas en Tecnología del Habla (JTH 2008)ATVS submission to ALBAYZIN-VL08 will consist of different combinations of a set of acoustic and phonotactic subsystems that our group has developed during the last
years. Most of these subsystems have already been evaluated on NIST LRE 07 evaluation. At the time of writing this system description some of the details of our submission are still undefined. Therefore we will briefly describe our systems and the intended combinations to be submitted, but these settings should not be taken as final in any way. As acoustic subsystems we will use a GMM SuperVectors and a GLDSSVM
subsystem, while the phonotactic subsystem will be a PhoneSVM system. We are still deciding the best fusion strategy and the best combination of subsystems at the time of
writing. Output scores will be submitted in the form of loglikelihood ratio (logLR) scores in an application independent way. Open-set detection thresholds will be set to the Bayes
thresholds in all cases, and the same logLR sets will probably be submitted to the closed- and open-set conditions.This work was funded by the Spanish Ministry of Science and Technology under project TEC2006-13170-C02-01
A GAUSSIAN MIXTURE MODEL-BASED SPEAKER RECOGNITION SYSTEM
A human being has lot of unique features and one of them is voice. Speaker recognition is the use of a system to distinguish and identify a person from his/her vocal sound. A speaker recognition system (SRS) can be used as one of the authentication technique, in addition to the conventional authentication methods. This paper represents the overview of voice signal characteristics and speaker recognition techniques. It also discusses the advantages and problem of current SRS. The only biometric system that allows users to authenticate remotely is voice-based SRS, we are in the need of a robust SRS
ATVS-UAM NIST LRE 2009 System Description
Official contribution of the National Institute of Standards and Technology; not subject to copyright in the United States.ATVS-UAM submits a fast, light and efficient single system. The use of a task-adapted nonspeech-recognition-based VAD (apart from NIST conversation labels) and gender-dependent total variability compensation technology allows our submitted system to obtain excellent development results with SRE08 data with exceptional computational efficiency. In order to test the VAD influence in the evaluation results, a contrastive equivalent system has been submitted exclusively changing ATVS VAD labels with BUT publicly contributed ones. In all contributed systems, two gender-independent calibrations have been trained with respectively telephone-only and mic (either mic-tel, tel-mic or mic-mic) data. The submitted systems have been designed for English speech in an application-independent way, all results being interpretable in the form of
calibrated likelihood ratios to be properly evaluated with Cllr. Sample development results with English SRE08 data are 0.53% (male) and 1.11% (female) EER in tel-tel data (optimistic as all English speakers in SRE08 are included in total variability matrices), going up to 3.5% (tel-tel) to 5.1% EER (tel-mic) in pessimistic cross-validation experiments (25% of test speakers totally excluded from development data in each xval set). The submitted system is extremely light in computational resources, running 77 times faster than real time. Moreover, once VAD and feature extraction are performed (the heaviest components of our system), training and testing are performed respectively at 5300 and 2950 times faster than real time
Support vector regression in NIST SRE 2008 multichannel core task
Actas de las V Jornadas en Tecnología del Habla (JTH 2008)This paper explores two alternatives for speaker verification
using Generalized Linear Discriminant Sequence (GLDS)
kernel: classical Support Vector Classification (SVC), and
Support Vector Regression (SVR), recently proposed by the
authors as a more robust approach for telephone speech. In
this work we address a more challenging environment, the
NIST SRE 2008 multichannel core task, where strong
mismatch is introduced by the use of different microphones
and recordings from interviews. Channel compensation based
in Nuisance Attribute Projection (NAP) has also been
investigated in order to analyze its impact for both
approaches. Experiments show that, although both techniques
show a significant improvement over SVC-GLDS when NAP
is used, SVR is also robust to channel mismatch even when
channel compensation is not used. This avoids the need of a
considerable set of training data adapted to the operational
scenario, whose availability is not frequent in general. Results
show a similar performance for SVR-GLDS without NAP and
SVC-GLDS with NAP. Moreover, SVR-GLDS results are
promising, since other configurations and methods for channel
compensation can further improve performance.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01
- …