Search CORE

13 research outputs found

A Generative Model for Score Normalization in Speaker Recognition

Author: Brummer Niko
Swart Albert
Publication venue
Publication date: 28/09/2017
Field of study

We propose a theoretical framework for thinking about score normalization, which confirms that normalization is not needed under (admittedly fragile) ideal conditions. If, however, these conditions are not met, e.g. under data-set shift between training and runtime, our theory reveals dependencies between scores that could be exploited by strategies such as score normalization. Indeed, it has been demonstrated over and over experimentally, that various ad-hoc score normalization recipes do work. We present a first attempt at using probability theory to design a generative score-space normalization model which gives similar improvements to ZT-norm on the text-dependent RSR 2015 database

arXiv.org e-Print Archive

Crossref

Measuring, refining and calibrating speaker and language information extracted from speech

Author: Brummer Niko
Publication venue: Stellenbosch : University of Stellenbosch
Publication date: 01/12/2010
Field of study

Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2010.ENGLISH ABSTRACT: We propose a new methodology, based on proper scoring rules, for the evaluation of the goodness of pattern recognizers with probabilistic outputs. The recognizers of interest take an input, known to belong to one of a discrete set of classes, and output a calibrated likelihood for each class. This is a generalization of the traditional use of proper scoring rules to evaluate the goodness of probability distributions. A recognizer with outputs in well-calibrated probability distribution form can be applied to make cost-effective Bayes decisions over a range of applications, having di fferent cost functions. A recognizer with likelihood output can additionally be employed for a wide range of prior distributions for the to-be-recognized classes. We use automatic speaker recognition and automatic spoken language recognition as prototypes of this type of pattern recognizer. The traditional evaluation methods in these fields, as represented by the series of NIST Speaker and Language Recognition Evaluations, evaluate hard decisions made by the recognizers. This makes these recognizers cost-and-prior-dependent. The proposed methodology generalizes that of the NIST evaluations, allowing for the evaluation of recognizers which are intended to be usefully applied over a wide range of applications, having variable priors and costs. The proposal includes a family of evaluation criteria, where each member of the family is formed by a proper scoring rule. We emphasize two members of this family: (i) A non-strict scoring rule, directly representing error-rate at a given prior. (ii) The strict logarithmic scoring rule which represents information content, or which equivalently represents summarized error-rate, or expected cost, over a wide range of applications. We further show how to form a family of secondary evaluation criteria, which by contrasting with the primary criteria, form an analysis of the goodness of calibration of the recognizers likelihoods. Finally, we show how to use the logarithmic scoring rule as an objective function for the discriminative training of fusion and calibration of speaker and language recognizers.AFRIKAANSE OPSOMMING: Ons wys hoe om die onsekerheid in die uittree van outomatiese sprekerherkenning- en taalherkenningstelsels voor te stel, te meet, te kalibreer en te optimeer. Dit maak die bestaande tegnologie akkurater, doeltre ender en meer algemeen toepasbaar

Stellenbosch University SUNScholar Repository

A Speaker Verification Backend with Robust Performance across Conditions

Author: Brummer Niko
Ferrer Luciana
McLaren Mitchell
Publication venue
Publication date: 02/02/2021
Field of study

In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing them through a backend composed of probabilistic linear discriminant analysis (PLDA) and global logistic regression score calibration. This method is known to result in systems that work poorly on conditions different from those used to train the calibration model. We propose to modify the standard backend, introducing an adaptive calibrator that uses duration and other automatically extracted side-information to adapt to the conditions of the inputs. The backend is trained discriminatively to optimize binary cross-entropy. When trained on a number of diverse datasets that are labeled only with respect to speaker, the proposed backend consistently and, in some cases, dramatically improves calibration, compared to the standard PLDA approach, on a number of held-out datasets, some of which are markedly different from the training data. Discrimination performance is also consistently improved. We show that joint training of the PLDA and the adaptive calibrator is essential -- the same benefits cannot be achieved when freezing PLDA and fine-tuning the calibrator. To our knowledge, the results in this paper are the first evidence in the literature that it is possible to develop a speaker verification system with robust out-of-the-box performance on a large variety of conditions

arXiv.org e-Print Archive

CONICET Digital

BAT System Description for NIST LRE 2015

Author: Brummer Niko
Burget Lukas
Cumani Sandro
Fer Radek
Glembek Ondrej
Grezl Frantisek
Karafiat Martin
Kesiraju Santosh
Li Ruizhi
Mallidi Sri Harish
Matejka Pavel
Novotny Ondrej
Ondel Lucas
Pesan Jan
Plchot Oldrich
Swart Albert
Vesely Karel
Publication venue: ISCA
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Measuring, refining and calibrating speaker and language information extracted from speech

Author: Brummer Niko
Publication venue: Stellenbosch : University of Stellenbosch
Publication date: 01/12/2010
Field of study

Stellenbosch University SUNScholar Repository