229 research outputs found

    A Monte-Carlo Method For Score Normalization in Automatic Speaker Verification Using Kullback-Leibler Distances

    Get PDF
    In this paper, we propose a new score normalization technique in Automatic Speaker Verification (ASV): the D-Norm. The main advantage of this score normalization is that it does not need any additional speech data nor external speaker population, as opposed to the state-ofthe-art approaches. The D-Norm is based on the use of Kullback-Leibler (KL) distances in an ASV context. In a first step, we estimate the KL distances with a Monte-Carlo method and we experimentally show that they are correlated with the verification scores. In a second step, we use this correlation to implement a score normalization procedure, the D-Norm. We analyse its performance and we compare it to that of a conventional normalization, the Z-Norm. The results show that performance of the D-Norm is comparable to that of the Z-Norm. We then conclude about the results we obtain and we discuss the applications of this work.

    Zero-resource audio-only spoken term detection based on a combination of template matching techniques

    Get PDF
    spoken term detection, template matching, unsupervised learning, posterior featuresInternational audienceSpoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers

    Supplementary material to the article: Estimating the structural segmentation of popular music pieces under regularity constraints

    Get PDF
    This document gathers descriptions of the structural segmentation systems considered in the IEEE/ACM TASLP paper by the same authors

    Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm

    Full text link
    Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ``chorus'', ``verse'', ``solo'', etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a dynamic programming algorithm that segments self-similarity matrices, which are a standard description used in MSA and in numerous other applications. In this work, self-similarity matrices are computed from the feature representation of an audio signal and time is sampled at the bar-scale. This study examines three different standard similarity functions for the computation of self-similarity matrices. Results show that, in optimal conditions, the proposed algorithm achieves a level of performance which is competitive with supervised state-of-the-art methods while only requiring knowledge of bar positions. In addition, the algorithm is made open-source and is highly customizable.Comment: 19 pages, 13 figures, 11 tables, 1 algorithm, published in Transactions of the International Society for Music Information Retrieva

    Convolutive Block-Matching Segmentation Algorithm with Application to Music Structure Analysis

    Full text link
    Music Structure Analysis (MSA) consists of representing a song in sections (such as ``chorus'', ``verse'', ``solo'' etc), and can be seen as the retrieval of a simplified organization of the song. This work presents a new algorithm, called Convolutive Block-Matching (CBM) algorithm, devoted to MSA. In particular, the CBM algorithm is a dynamic programming algorithm, applying on autosimilarity matrices, a standard tool in MSA. In this work, autosimilarity matrices are computed from the feature representation of an audio signal, and time is sampled on the barscale. We study three different similarity functions for the computation of autosimilarity matrices. We report that the proposed algorithm achieves a level of performance competitive to that of supervised state-of-the-art methods on 3 among 4 metrics, while being fully unsupervised.Comment: 4 pages, 5 figures, 1 table. Submitted at ICASSP 2023. The associated toolbox is available at https://gitlab.inria.fr/amarmore/autosimilarity_segmentatio

    Methodological and musicological investigation of the System & Contrast model for musical form description

    Get PDF
    The semiotic description of music structure aims at representing the high-level organization of music pieces in a concise, generic and reproducible way as a low-rate stream of arbitrary symbols from a limited alphabet, which results into a sequence of " semiotic units ". In this context, the purpose of the System & Contrast model is to address the internal organization of the semiotic units. In this report, the System & Contrast model is approached from different angles in relation to varied disciplines : cognitive psychology, music analysis and information theory. After establishing a number of links between the System & Contrast model and other approaches of music structure, the model is illustrated on studio-based popular music pieces, as well as on music from the classical Viennese period

    Well-posedness of the permutation problem in sparse filter estimation with lp minimization

    Get PDF
    Convolutive source separation is often done in two stages: 1) estimation of the mixing filters and 2) estimation of the sources. Traditional approaches suffer from the ambiguities of arbitrary permutations and scaling in each frequency bin of the estimated filters and/or the sources, and they are usually corrected by taking into account some special properties of the filters/sources. This paper focusses on the filter permutation problem in the absence of scaling, investigating the possible use of the temporal sparsity of the filters as a property enabling permutation correction. Theoretical and experimental results highlight the potential as well as the limits of sparsity as an hypothesis to obtain a well-posed permutation problem

    Likelihood ratio adjustment for the compensation of model mismatch in speaker verification

    Get PDF
    Cet article présente une méthode d'ajustement des seuils de vérification du locuteur basée sur un modÚle Gaussien des distributions du logarithme du rapport de vraisemblance. L'article expose les hypothÚses sous lesquelles ce modÚle est valide, indique plusieurs méthodes d'ajustement des seuils, et en illustre les apports et les limites par des expériences de vérification sur une base de données de 20 locuteurs

    Adaptation robuste de modeles HMM pour la verification du locuteur dependante du texte

    Get PDF
    When deploying a secure system based on speaker verification, the limited amount of training data is usually critical. Indeed, the enrollment procedure must be fast and user-friendly. An incremental training of HMM speaker models, based on a MAP (Maximum A Posteriori) adaptation technique is used in order to make the enrollment more robust with only one or two utterances of the client password. This paper presents the improvements which can be achieved, in term of verification performance and stability of the decision thresholds. Our results highlight the benefits of MAP adaptation in conjunction with a synchronous alignment approach

    Signal modeling with Non Uniform Topology lattice filters

    Get PDF
    This article presents a new class of constrained and specialized Auto-Regressive (AR) processes. They are derived from lattice filters where some reflection coefficients are forced to zero at a priori locations. Optimizing the filter topology allows to build parametric spectral models that have a greater number of poles than the number of parameters needed to describe their location. These NUT (Non-Uniform Topology) models are assessed by evaluating the reduction of modeling error with respect to conventional AR models
    • 

    corecore