Search CORE

931 research outputs found

Evaluating automatic speaker recognition systems: an overview of the nist speaker recognition evaluations (1996-2014)

Author: González-Rodríguez Joaquín
Publication venue: 'Editorial CSIC'
Publication date: 01/01/2014
Field of study

2014 CSIC. Manuscripts published in this Journal are the property of the Consejo Superior de Investigaciones Científicas, and quoting this source is a requirement for any partial or full reproduction.Automatic Speaker Recognition systems show interesting properties, such as speed of processing or repeatability of results, in contrast to speaker recognition by humans. But they will be usable just if they are reliable. Testability, or the ability to extensively evaluate the goodness of the speaker detector decisions, becomes then critical. In the last 20 years, the US National Institute of Standards and Technology (NIST) has organized, providing the proper speech data and evaluation protocols, a series of text-independent Speaker Recognition Evaluations (SRE). Those evaluations have become not just a periodical benchmark test, but also a meeting point of a collaborative community of scientists that have been deeply involved in the cycle of evaluations, allowing tremendous progress in a specially complex task where the speaker information is spread across different information levels (acoustic, prosodic, linguistic…) and is strongly affected by speaker intrinsic and extrinsic variability factors. In this paper, we outline how the evaluations progressively challenged the technology including new speaking conditions and sources of variability, and how the scientific community gave answers to those demands. Finally, NIST SREs will be shown to be not free of inconveniences, and future challenges to speaker recognition assessment will also be discussed

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Loquens (E-Journal)

Biblos-e Archivo

Cross-entropy analysis of the information in forensic speaker recognition

Author: González-Rodríguez Joaquín
Ramos Daniel
Publication venue: 'International Speech Communication Association'
Publication date: 21/01/2008
Field of study

Proceedings of Odyssey 2008: The Speaker and Language Recognition Workshop, Stellenbosch, South AfricaIn this work we analyze the average information supplied by a forensic speaker recognition system in an information theoretical way. The objective is the transparent reporting of the performance of the system in terms of information, according to the needs of transparency and testability in forensic science. This analysis allows the derivation of a proper measure of goodness for forensic speaker recognition, the empirical cross-entropy (ECE), according to previous work in the literature. We also propose an intuitive representation, namely the ECE plot, which allows forensic scientists to explain the average information given by the evidence analysis process in a clear and intuitive way. Such representation allows the forensic scientist to assess the evidence evaluation process with independence of the prior information, which is province of the court. Then, fact finders may check the average information given by the evidence analysis with the incorporation of prior information. An experimental example following NIST SRE 2006 protocol is presented in order to highlight the adequacy of the proposed framework in the forensic inferential process. An example of the presentation of the average information supplied by the forensic analysis of the speech evidence in court is also provided, simulating a real case.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01

Biblos-e Archivo

Linguistically-constrained formant-based i-vectors for automatic speaker recognition

Author: Franco-Pedroso Javier
González-Rodríguez Joaquín
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

This is the author’s version of a work that was accepted for publication in Speech Communication. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Speech Communication, VOL 76 (2016) DOI 10.1016/j.specom.2015.11.002This paper presents a large-scale study of the discriminative abilities of formant frequencies for automatic speaker recognition. Exploiting both the static and dynamic information in formant frequencies, we present linguistically-constrained formant-based i-vector systems providing well calibrated likelihood ratios per comparison of the occurrences of the same isolated linguistic units in two given utterances. As a first result, the reported analysis on the discriminative and calibration properties of the different linguistic units provide useful insights, for instance, to forensic phonetic practitioners. Furthermore, it is shown that the set of units which are more discriminative for every speaker vary from speaker to speaker. Secondly, linguistically-constrained systems are combined at score-level through average and logistic regression speaker-independent fusion rules exploiting the different speaker-distinguishing information spread among the different linguistic units. Testing on the English-only trials of the core condition of the NIST 2006 SRE (24,000 voice comparisons of 5 minutes telephone conversations from 517 speakers -219 male and 298 female-), we report equal error rates of 9.57 and 12.89% for male and female speakers respectively, using only formant frequencies as speaker discriminative information. Additionally, when the formant-based system is fused with a cepstral i-vector system, we obtain relative improvements of ∼6% in EER (from 6.54 to 6.13%) and ∼15% in minDCF (from 0.0327 to 0.0279), compared to the cepstral system alone.This work has been supported by the Spanish Ministry of Economy and Competitiveness (project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz, TEC2012-37585-C02-01). Also, the authors would like to thank SRI for providing the Decipher phonetic transcriptions of the NIST 2004, 2005 and 2006 SREs that have allowed to carry out this work

Biblos-e Archivo

Forensic automatic speaker recognition: fiction or science?

Author: González-Rodríguez Joaquín
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2008
Field of study

Proceedings of Interspeech 2008, Brisbane (Australia)Incluye presentación en Power Point ofrecida durante el congreso.Hollywood films and CSI-like movies show a technology landscape far from real, both in forensic speaker recognition and other identification-of-the-source forensic areas. Lay persons are used to good-looking scientist-and-investigators performing voice identifications ("we got a match!") or smart fancy devices producing voice transformations causing one actor to instantaneously talk with the voice of other. Simultaneously, Forensic Identification Science is facing a global challenge impelled firstly by progressively higher requirements for admissibility of expert testimony in Court and secondly by the transparent and testable nature of DNA typing, which is now seen as the new gold-standard model of a scientifically defensible approach to be emulated by all other identification-of-the-source areas. In this presentation we will show how forensic speaker recognition can comply with the requirements of transparency and testability in forensic science This will lead to fulfilling the court requirements about role separation between scientists and judges/juries, and bring about integration in a forensically adequate framework in which the scientist provides the appropriate information necessary to the court's decision processes

Biblos-e Archivo

Information-theoretical comparison of evidence evaluation methods for score-based biometric systems

Author: Fiérrez Julián
González-Rodríguez Joaquín
Ramos Daniel
Publication venue
Publication date: 01/08/2008
Field of study

Ponencia presentada en la Seventh International Conference on Forensic Inference and Statistics, The University of Lausanne, Switzerland, August 2008Biometric systems are a powerful tool in many forensic disciplines in order to aid scientists to evaluate the weight of the evidence. However, uprising requirements of admissibility in forensic science demand scientific methods in order to test the accuracy of the forensic evidence evaluation process. In this work we analyze and compare several evidence analysis methods for score-based biometric systems. For all of them, the score given by the system is transformed into a likelihood ratio ( LR) which expresses the weight of the evidence. The accuracy of each LR computation method will be assessed by classical Tippett plots- We also propose measuring accuracy in terms of average information given by the evidence evaluation process, by means of Empirical Cross-Entropy (EC-E) plots. Preliminary results are presented using a voice biometric system and the NIST SRE 2006 experimental protocol

Biblos-e Archivo

Gaussian Mixture Models of Between-Source Variation for Likelihood Ratio Computation from Multivariate Data

Author: Franco-Pedroso Javier
González-Rodríguez Joaquín
Ramos Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Franco-Pedroso J, Ramos D, Gonzalez-Rodriguez J (2016) Gaussian Mixture Models of Between-Source Variation for Likelihood Ratio Computation from Multivariate Data. PLoS ONE 11(2): e0149958. doi:10.1371/journal.pone.0149958In forensic science, trace evidence found at a crime scene and on suspect has to be evaluated from the measurements performed on them, usually in the form of multivariate data (for example, several chemical compound or physical characteristics). In order to assess the strength of that evidence, the likelihood ratio framework is being increasingly adopted. Several methods have been derived in order to obtain likelihood ratios directly from univariate or multivariate data by modelling both the variation appearing between observations (or features) coming from the same source (within-source variation) and that appearing between observations coming from different sources (between-source variation). In the widely used multivariate kernel likelihood-ratio, the within-source distribution is assumed to be normally distributed and constant among different sources and the between-source variation is modelled through a kernel density function (KDF). In order to better fit the observed distribution of the between-source variation, this paper presents a different approach in which a Gaussian mixture model (GMM) is used instead of a KDF. As it will be shown, this approach provides better-calibrated likelihood ratios as measured by the log-likelihood ratio cost (C-llr) in experiments performed on freely available forensic datasets involving different trace evidences: inks, glass fragments and car paints.JFP recieved funding from "Ministerio de Economia y Competitividad (ES)" (http://www.mineco.gob.es/) through the project "CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Senal de Voz", with grant number TEC2012-37585-C02-01. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Biblos-e Archivo

Von Mises-Fisher models in the total variability subspace for language recognition

Author: González Domínguez Javier
González-Rodríguez Joaquín
López Moreno Ignacio
Ramos Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Biblos-e Archivo

Likelihood ratio calibration in a transparent and testable forensic speaker recognition framework

Author: González-Rodríguez Joaquín
Ortega-García Javier
Ramos Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. D. Ramos, J. González-Rodríguez, J. Ortega-garcía, "Likelihood Ratio Calibration in a Transparent and Testable Forensic Speaker Recognition Framework " in The Speaker and Language Recognition Workshop, ODYSSEY, San Juan (Puerto Rico), 2006, 1 - 8A recently reopened debate about the infallibility of some classical forensic disciplines is leading to new requirements in forensic science. Standardization of procedures, proficiency testing, transparency in the scientific evaluation of the evidence and testability of the system and protocols are emphasized in order to guarantee the scientific objectivity of the procedures. Those ideas will be exploited in this paper in order to walk towards an appropriate framework for the use of forensic speaker recognition in courts. Evidence is interpreted using the Bayesian approach for the analysis of the evidence, as a scientific and logical methodology, in a two-stage approach based in the similarity-typicality pair, which facilitates the transparency in the process. The concept of calibration as a way of reporting reliable and accurate opinions is also deeply addressed, presenting experimental results which illustrate its effects. The testability of the system is then accomplished by the use of the NIST SRE 2005 evaluation protocol. Recently proposed application-independent evaluation techniques (Cllr and APE curves) are finally addressed as a proper way for presenting results of proficiency testing in courts, as these evaluation metrics clearly show the influence of calibration errors in the accuracy of the inferential decision processThis work has been supported by the Spanish Ministry for Science and Technology under project TIC2003-09068-C02-01

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

ATVS-UAM NIST LRE 2009 System Description

Author: Franco-Pedroso Javier
González-Domínguez Joaquín
González-Rodríguez Joaquín
López-Moreno Ignacio
Ramos Daniel
Toledano Doroteo T.
Publication venue: 'National Institute of Standards and Technology (NIST)'
Publication date: 01/01/2009
Field of study

Official contribution of the National Institute of Standards and Technology; not subject to copyright in the United States.ATVS-UAM submits a fast, light and efficient single system. The use of a task-adapted nonspeech-recognition-based VAD (apart from NIST conversation labels) and gender-dependent total variability compensation technology allows our submitted system to obtain excellent development results with SRE08 data with exceptional computational efficiency. In order to test the VAD influence in the evaluation results, a contrastive equivalent system has been submitted exclusively changing ATVS VAD labels with BUT publicly contributed ones. In all contributed systems, two gender-independent calibrations have been trained with respectively telephone-only and mic (either mic-tel, tel-mic or mic-mic) data. The submitted systems have been designed for English speech in an application-independent way, all results being interpretable in the form of calibrated likelihood ratios to be properly evaluated with Cllr. Sample development results with English SRE08 data are 0.53% (male) and 1.11% (female) EER in tel-tel data (optimistic as all English speakers in SRE08 are included in total variability matrices), going up to 3.5% (tel-tel) to 5.1% EER (tel-mic) in pessimistic cross-validation experiments (25% of test speakers totally excluded from development data in each xval set). The submitted system is extremely light in computational resources, running 77 times faster than real time. Moreover, once VAD and feature extraction are performed (the heaviest components of our system), training and testing are performed respectively at 5300 and 2950 times faster than real time

Biblos-e Archivo

Dynamic media stream mobility with TURN

Author: Alonso González Álvaro
Cerviño Arriba Javier
Rodríguez Pérez Pedro
Salvachúa Rodríguez Joaquín
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Multi party videoconference systems use MCU (Multipoint Control Unit) devices to forward media streams. In this paper we describe a mechanism that allows the mobility of such streams between MCU devices. This mobility is especially useful when redistribution of streams is needed due to scalability requirements. These requirements are mandatory in Cloud scenarios to adapt the number of MCUs and their capabilities to variations in the user demand. Our mechanism is based on TURN (Traversal Using Relay around NAT) standard and adapts MICE (Mobility with ICE) specification to the requirements of this kind of scenarios. We conclude that this mechanism achieves the stream mobility in a transparent way for client nodes and without interruptions for the users

Archivo Digital UPM