3 research outputs found

    Training pairwise Support Vector Machines with large scale datasets

    Get PDF
    We recently presented an efficient approach for training a Pairwise Support Vector Machine (PSVM) with a suitable kernel for a quite large speaker recognition task. The PSVM approach, rather than estimating an SVM model per class according to the “one versus all” discriminative paradigm, classifies pairs of examples as belonging or not to the same class. Training a PSVM with large amount of data, however, is a memory and computational expensive task, because the number of training pairs grows quadratically with the number of training patterns. This paper proposes an approach that allows discarding the training pairs that do not essentially contribute to the set of Support Vectors (SVs) of the training set. This selection of training pairs is feasible because we show that the number of SVs does not grow quadratically, with the number of pairs, but only linearly with the number of speakers in the training set. Our approach dramatically reduces the memory and computational complexity of PSVM training, making possible the use of large datasets, including many speakers. It has been assessed on the extended core conditions of the 2012 Speaker Recognition Evaluation. The results show that the accuracy of the trained PSVMs increases with the training set size, and that the Cprimary of a PSVM trained with a small subset of the i–vectors pairs is 10-30% better than the one obtained by a generative model trained on the complete set of i–vectors

    IITG-Indigo System for NIST 2016 SRE Challenge

    Get PDF
    This paper describes the speaker verification (SV) system submitted to the NIST 2016 speaker recognition evaluation (SRE) challenge by Indian Institute of Technology Guwahati (IITG) under the fixed training condition task. Various SV systems are developed following the idea-level collaboration with two other Indian institutions. Unlike the previous SREs, this time the focus was on developing SV system using non-target language speech data and a small amount unlabeled data from target language/dialects. For addressing these novel challenges, we tried exploring the fusion of systems created using different features, data conditioning, and classifiers. On NIST 2016 SRE evaluation data, the presented fused system resulted in actual detection cost function (actDCF) and equal error rate (EER) of 0.81 and 12.91%, respectively. Post-evaluation, we explored a recently proposed pairwise support vector machine classifier and applied adaptive S-norm to the decision scores before fusion. With these changes, the final system achieves the actDCF and EER of 0.67 and 11.63%, respectively

    Large scale training of Pairwise Support Vector Machines for speaker recognition

    Get PDF
    State–of–the–art systems for text–independent speaker recognition use as their features a compact representation of a speaker utterance, known as “i–vector”. We recently presented an efficient approach for training a Pairwise Support Vector Machine (PSVM) with a suitable kernel for i–vector pairs for a quite large speaker recognition task. Rather than estimating an SVM model per speaker, according to the “one versus all” discriminative paradigm, the PSVM approach classifies a trial, consisting of a pair of i–vectors, as belonging or not to the same speaker class. Training a PSVM with large amount of data, however, is a memory and computational expensive task, because the number of training pairs grows quadratically with the number of training i–vectors. This paper demonstrates that a very small subset of the training pairs is necessary to train the original PSVM model, and proposes two approaches that allow discarding most of the training pairs that are not essential, without harming the accuracy of the model. This allows dramatically reducing the memory and computational resources needed for training, which becomes feasible with large datasets including many speakers. We have assessed these approaches on the extended core conditions of the NIST 2012 Speaker Recognition Evaluation. Our results show that the accuracy of the PSVM trained with a sufficient number of speakers is 10-30% better compared to the one obtained by a PLDA model, depending on the testing conditions. Since the PSVM accuracy increases with the training set size, but PSVM training does not scale well for large numbers of speakers, our selection techniques become relevant for training accurate discriminative classifiers
    corecore