Search CORE

7 research outputs found

Speech Frame Selection for Spoofing Detection with an Application to Partially Spoofed Audio-Data

Author: Kumar A. Kishore
Pal Monisankha
Paul Dipjyoti
Saha Goutam
Sahidullah Md
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/01/2021
Field of study

International audienceIn this paper, we introduce a frame selection strategy for improved detection of spoofed speech. A countermeasure (CM) system typically uses a Gaussian mixture model (GMM) based classifier for computing the log-likelihood scores. The average log-likelihood ratio for all speech frames of a test utterance is calculated as the score for the decision making. As opposed to this standard approach, we propose to use selected speech frames of the test utterance for scoring. We present two simple and computationally efficient frame selection strategies based on the log-likelihood ratios of the individual frames. The performance is evaluated with constant-Q cepstral coefficients as front-end feature extraction and two-class GMM as a back-end classifier. We conduct the experiments using the speech corpora from ASVspoof 2015, 2017, and 2019 challenges. The experimental results show that the proposed scoring techniques substantially outperform the conventional scoring technique for both the development and evaluation data set of ASVspoof 2015 corpus. We did not observe noticeable performance gain in ASVspoof 2017 and ASVspoof 2019 corpus. We further conducted experiments with partially spoofed data where spoofed data is created by augmenting natural and spoofed speech. In this scenario, the proposed methods demonstrate considerable performance improvement over baseline

INRIA a CCSD electronic archive server

HAL Descartes

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

Author: Bishop Somer
Kim So Hyun
Kumar Manoj
Lord Catherine
Narayanan Shrikanth
Pal Monisankha
Park Tae Jin
Peri Raghuveer
Publication venue
Publication date: 19/07/2020
Field of study

The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN network to improve diarization robustness and enable rapid generalization across various challenging domains. To this end, we fetch the pre-trained encoder from the ClusterGAN and fine-tune it by using prototypical loss (meta-ClusterGAN or MCGAN) under the meta-learning paradigm. Experiments are conducted on CALLHOME telephonic conversations, AMI meeting data, DIHARD II (dev set) which includes challenging multi-domain corpus, and two child-clinician interaction corpora (ADOS, BOSCC) related to the autism spectrum disorder domain. Extensive analyses of the experimental data are done to investigate the effectiveness of the proposed ClusterGAN and MCGAN embeddings over x-vectors. The results show that the proposed embeddings with normalized maximum eigengap spectral clustering (NME-SC) back-end consistently outperform Kaldi state-of-the-art z-vector diarization system. Finally, we employ embedding fusion with x-vectors to provide further improvement in diarization performance. We achieve a relative diarization error rate (DER) improvement of 6.67% to 53.93% on the aforementioned datasets using the proposed fused embeddings over x-vectors. Besides, the MCGAN embeddings provide better performance in the number of speakers estimation and short speech segment diarization as compared to x-vectors and ClusterGAN in telephonic data.Comment: Submitted to IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSIN

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

Author: Pal Monisankha,
Publication venue
Publication date: 21/06/2023
Field of study

Ezid

Speech Frame Selection for Spoofing Detection with an Application to Partially Spoofed Audio-Data

Author: Kumar Kishore,
Pal Monisankha
Paul Dipjyoti
Saha Goutam
Sahidullah Md
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/01/2021
Field of study

HAL Descartes