20 research outputs found

    Effects of Waveform PMF on Anti-Spoofing Detection

    Get PDF
    International audienceIn the context of detection of speaker recognition identity impersonation , we observed that the waveform probability mass function (PMF) of genuine speech differs from significantly of of PMF from identity theft extracts. This is true for synthesized or converted speech as well as for replayed speech. In this work, we mainly ask whether this observation has a significant impact on spoofing detection performance. In a second step, we want to reduce the distribution gap of waveforms between authentic speech and spoofing speech. We propose a genuiniza-tion of the spoofing speech (by analogy with Gaussianisation), i.e. to obtain spoofing speech with a PMF close to the PMF of genuine speech. Our genuinization is evaluated on ASVspoof 2019 challenge datasets, using the baseline system provided by the challenge organization. In the case of constant Q cep-stral coefficients (CQCC) features, the genuinization leads to a degradation of the baseline system performance by a factor of 10, which shows a potentially large impact of the distribution os waveforms on spoofing detection performance. However, by ''playing" with all configurations, we also observed different behaviors, including performance improvements in specific cases. This leads us to conclude that waveform distribution plays an important role and must be taken into account by anti-spoofing systems

    Time-Domain Based Embeddings for Spoofed Audio Representation

    Full text link
    Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filter-bank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data lies. The suggested embeddings allow the use of simple linear separators to achieve decent performance. In addition, we present a convenient way to visualize the data, which helps to assess the efficiency of different spoofing techniques. The experimental results show the potential of using multi-channel PMF based features for the anti-spoofing task, in addition to the benefits of using diffusion maps both as an analysis tool and as an embedding tool

    Speech database and protocol validation using waveform entropy

    No full text

    Dichotomy between Clustering Performance and Minimum Distortion . . .

    No full text
    In many signal such speech, bio-signals, protein chains, etc. there is a dependency between consecutive vectors. As the dependency is limited in duration such data can be called as Piecewise-DependentData (PDD). In clustering it is frequently needed to minimize a given distance function. In this paper we will show that in PDD clustering there is a contradiction between the desire for high resolution (short segments and low distance) and high accuracy (long segments and high distortion), i.e. meaningful clustering

    EXTENDED BIC CRITERION FOR MODEL SELECTION

    No full text
    Abstract. Model selection is commonly based on some variation of the BIC or minimum message length criteria, such as MML and MDL. In either case the criterion is split into two terms: one for the model (data code length/model complexity) and one for the data given the model (message length/data likelihood). For problems such as change detection, unsupervised segmentation or data clustering it is common practice for the model term to comprise only a sum of sub-model terms. In this paper it is shown that the full model complexity must also take into account the number of sub models and the labels which assign data to each sub model. From this analysis we derive an extended BIC approach (EBIC) for this class of problem. Results with artificial data are given to illustrate the properties of this procedure. IDIAP-RR-02-42 2 1

    What Is Better: GMM of Two . . .

    No full text
    In this report, we provide a theoretical discussion on temporal data cluster analysis: does the data come from one source or two sources; is it better to cluster the data into two clusters or leave it as one cluster. Here we analyse only the simplest case: when the data comes from two symmetric Gaussian probability-densityfunctions (pdfs), i.e., with same variance and same absolute value of the mean, with the same prior probability per Gaussian. The data consists of segments with an a-priori known segment length. It will be shown that if the data belongs to two different Gaussian models, the likelihood of two clusters is always higher or equal than the one of a GMM with two Gaussians for any mean, variance, and segment length. If the data belongs to the GMM, the likelihood of two clusters might be either higher or less than the GMM one
    corecore