20 research outputs found
Effects of Waveform PMF on Anti-Spoofing Detection
International audienceIn the context of detection of speaker recognition identity impersonation , we observed that the waveform probability mass function (PMF) of genuine speech differs from significantly of of PMF from identity theft extracts. This is true for synthesized or converted speech as well as for replayed speech. In this work, we mainly ask whether this observation has a significant impact on spoofing detection performance. In a second step, we want to reduce the distribution gap of waveforms between authentic speech and spoofing speech. We propose a genuiniza-tion of the spoofing speech (by analogy with Gaussianisation), i.e. to obtain spoofing speech with a PMF close to the PMF of genuine speech. Our genuinization is evaluated on ASVspoof 2019 challenge datasets, using the baseline system provided by the challenge organization. In the case of constant Q cep-stral coefficients (CQCC) features, the genuinization leads to a degradation of the baseline system performance by a factor of 10, which shows a potentially large impact of the distribution os waveforms on spoofing detection performance. However, by ''playing" with all configurations, we also observed different behaviors, including performance improvements in specific cases. This leads us to conclude that waveform distribution plays an important role and must be taken into account by anti-spoofing systems
Time-Domain Based Embeddings for Spoofed Audio Representation
Anti-spoofing is the task of speech authentication. That is, identifying
genuine human speech compared to spoofed speech. The main focus of this paper
is to suggest new representations for genuine and spoofed speech, based on the
probability mass function (PMF) estimation of the audio waveforms' amplitude.
We introduce a new feature extraction method for speech audio signals: unlike
traditional methods, our method is based on direct processing of time-domain
audio samples. The PMF is utilized by designing a feature extractor based on
different PMF distances and similarity measures. As an additional step, we used
filter-bank preprocessing, which significantly affects the discriminative
characteristics of the features and facilitates convenient visualization of
possible clustering of spoofing attacks. Furthermore, we use diffusion maps to
reveal the underlying manifold on which the data lies.
The suggested embeddings allow the use of simple linear separators to achieve
decent performance. In addition, we present a convenient way to visualize the
data, which helps to assess the efficiency of different spoofing techniques.
The experimental results show the potential of using multi-channel PMF based
features for the anti-spoofing task, in addition to the benefits of using
diffusion maps both as an analysis tool and as an embedding tool
Dichotomy between Clustering Performance and Minimum Distortion . . .
In many signal such speech, bio-signals, protein chains, etc. there is a dependency between consecutive vectors. As the dependency is limited in duration such data can be called as Piecewise-DependentData (PDD). In clustering it is frequently needed to minimize a given distance function. In this paper we will show that in PDD clustering there is a contradiction between the desire for high resolution (short segments and low distance) and high accuracy (long segments and high distortion), i.e. meaningful clustering
EXTENDED BIC CRITERION FOR MODEL SELECTION
Abstract. Model selection is commonly based on some variation of the BIC or minimum message length criteria, such as MML and MDL. In either case the criterion is split into two terms: one for the model (data code length/model complexity) and one for the data given the model (message length/data likelihood). For problems such as change detection, unsupervised segmentation or data clustering it is common practice for the model term to comprise only a sum of sub-model terms. In this paper it is shown that the full model complexity must also take into account the number of sub models and the labels which assign data to each sub model. From this analysis we derive an extended BIC approach (EBIC) for this class of problem. Results with artificial data are given to illustrate the properties of this procedure. IDIAP-RR-02-42 2 1
What Is Better: GMM of Two . . .
In this report, we provide a theoretical discussion on temporal data cluster analysis: does the data come from one source or two sources; is it better to cluster the data into two clusters or leave it as one cluster. Here we analyse only the simplest case: when the data comes from two symmetric Gaussian probability-densityfunctions (pdfs), i.e., with same variance and same absolute value of the mean, with the same prior probability per Gaussian. The data consists of segments with an a-priori known segment length. It will be shown that if the data belongs to two different Gaussian models, the likelihood of two clusters is always higher or equal than the one of a GMM with two Gaussians for any mean, variance, and segment length. If the data belongs to the GMM, the likelihood of two clusters might be either higher or less than the GMM one