Search CORE

388 research outputs found

A fast and robust method for the identification of face landmarks in profile images

Author: Bottino Andrea Giuseppe
Cumani Sandro
Publication venue
Publication date: 01/01/2008
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

On the Distribution of Speaker Verification Scores: Generative Models for Unsupervised Calibration

Author: Cumani S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Speaker verification systems whose outputs can be interpreted as log-likelihood ratios (LLR) allow for cost-effective decisions by comparing the system outputs to application-defined thresholds depending only on prior information. Classifiers often produce uncalibrated scores, and require additional processing to produce well-calibrated LLRs. Recently, generative score calibration models have been proposed, which achieve calibration performance close to that of state-of-the-art discriminative techniques for supervised scenarios, while also allowing for unsupervised training. The effectiveness of these methods, however, strongly depends on their capabilities to correctly model the target and non-target score distributions. In this work we propose theoretically grounded and accurate models for characterizing the distribution of scores of speaker verification systems. Our approach is based on tied Generalized Hyperbolic distributions and overcomes many limitations of Gaussian models. Experimental results on different NIST benchmarks, using different utterance representation front-ends and different back-end classifiers, show that our method is effective not only in supervised scenarios, but also in unsupervised tasks characterized by very low proportion of target trials

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Fast Scoring of Full Posterior PLDA Models

Author: Cumani Sandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Analysis of Large-Scale SVM Training Algorithms for Language and Speaker Recognition

Author: Cumani Sandro
Laface Pietro
Publication venue: Piscataway, N.J. : IEEE
Publication date: 01/01/2012
Field of study

This paper compares a set of large scale support vector machine (SVM) training algorithms for language and speaker recognition tasks.We analyze five approaches for training phonetic and acoustic SVM models for language recognition. We compare the performance of these approaches as a function of the training time required by each of them to reach convergence, and we discuss their scalability towards large corpora. Two of these algorithms can be used in speaker recognition to train a SVM that classifies pairs of utterances as either belonging to the same speaker or to two different speakers. Our results show that the accuracy of these algorithms is asymptotically equivalent, but they have different behavior with respect to the time required to converge. Some of these algorithms not only scale linearly with the training set size, but are also able to give their best results after just a few iterations. State-of-the-art performance has been obtained in the female subset of the NIST 2010 Speaker Recognition Evaluation extended core test using a single SVM syste

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Factorized Sub-Space Estimation for Fast and Memory Effective I-vector Extraction

Author: Cumani Sandro
Laface Pietro
Publication venue: IEEE - INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/01/2013
Field of study

Most of the state-of-the-art speaker recognition systems use a compact representation of spoken utterances referred to as i-vector. Since the "standard" i-vector extraction procedure requires large memory structures and is relatively slow, new approaches have recently been proposed that are able to obtain either accurate solutions at the expense of an increase of the computational load, or fast approximate solutions, which are traded for lower memory costs. We propose a new approach particularly useful for applications that need to minimize their memory requirements. Our solution not only dramatically reduces the memory needs for i-vector extraction, but is also fast and accurate compared to recently proposed approaches. Tested on the female part of the tel-tel extended NIST 2010 evaluation trials, our approach substantially improves the performance with respect to the fastest but inaccurate eigen-decomposition approach, using much less memory than other method

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A Generative Model for Duration-Dependent Score Calibration

Author: Cumani Sandro
Sarni Salvatore
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

I–vector transformation and scaling for PLDA based speaker recognition

Author: CUMANI SANDRO
LAFACE Pietro
Publication venue: place:Bilbao
Publication date: 01/01/2016
Field of study

This paper proposes a density model transformation for speaker recognition systems based on i–vectors and Probabilistic Linear Discriminant Analysis (PLDA) classification. The PLDA model assumes that the i-vectors are distributed according to the standard normal distribution, whereas it is well known that this is not the case. Experiments have shown that the i–vector are better modeled, for example, by a Heavy–Tailed distribution, and that significant improvement of the classification performance can be obtained by whitening and length normalizing the i-vectors. In this work we propose to transform the i–vectors, extracted ignoring the classifier that will be used, so that their distribution becomes more suitable to discriminate speakers using PLDA. This is performed by means of a sequence of affine and non–linear transformations whose parameters are obtained by Maximum Likelihood (ML) estimation on the training set. The second contribution of this work is the reduction of the mismatch between the development and test i–vector distributions by means of a scaling factor tuned for the estimated i-vector distribution, rather than by means of a blind length normalization. Our tests performed on the NIST SRE-2010 and SRE-2012 evaluation sets show that improvement of their Cost Functions of the order of 10% can be obtained for both evaluation data

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Impostor Score Statistics as Quality Measures for the Calibration of Speaker Verification Systems

Author: Cumani Sandro
Sarni Salvatore
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2022
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Fast and Memory Effective I-Vector Extraction Using a Factorized Sub-Space

Author: Cumani Sandro
Laface Pietro
Publication venue: International Speech Communication Association
Publication date: 01/01/2013
Field of study

Most of the state-of-the-art speaker recognition systems use a compact representation of spoken utterances referred to as i-vectors. Since the "standard" i-vector extraction procedure requires large memory structures and is relatively slow, new approaches have recently been proposed that are able to obtain either accurate solutions at the expense of an increase of the computational load, or fast approximate solutions, which are traded for lower memory costs. We propose a new approach particularly useful for applications that need to minimize their memory requirements. Our solution not only dramatically reduces the storage needs for i-vector extraction, but is also fast. Tested on the female part of the tel-tel extended NIST 2010 evaluation trials, our approach substantially improves the performance with respect to the fastest but inaccurate eigen-decomposition approach, using much less memory than any other known method

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino