Search CORE

229 research outputs found

A Monte-Carlo Method For Score Normalization in Automatic Speaker Verification Using Kullback-Leibler Distances

Author: Frédéric Bimbot
Mathieu Ben
Raphaël Blouet
Publication venue
Publication date: 01/01/2002
Field of study

In this paper, we propose a new score normalization technique in Automatic Speaker Verification (ASV): the D-Norm. The main advantage of this score normalization is that it does not need any additional speech data nor external speaker population, as opposed to the state-ofthe-art approaches. The D-Norm is based on the use of Kullback-Leibler (KL) distances in an ASV context. In a first step, we estimate the KL distances with a Monte-Carlo method and we experimentally show that they are correlated with the verification scores. In a second step, we use this correlation to implement a score normalization procedure, the D-Norm. We analyse its performance and we compare it to that of a conventional normalization, the Z-Norm. The results show that performance of the D-Norm is comparable to that of the Z-Norm. We then conclude about the results we obtain and we discuss the applications of this work.

CiteSeerX

Crossref

Zero-resource audio-only spoken term detection based on a combination of template matching techniques

Author: Bimbot Frédéric
Gravier Guillaume
Muscariello Armando
Publication venue: HAL CCSD
Publication date: 27/08/2011
Field of study

spoken term detection, template matching, unsupervised learning, posterior featuresInternational audienceSpoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Supplementary material to the article: Estimating the structural segmentation of popular music pieces under regularity constraints

Author: Bimbot Frédéric
Sargent Gabriel
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 23/09/2016
Field of study

This document gathers descriptions of the structural segmentation systems considered in the IEEE/ACM TASLP paper by the same authors

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm

Author: Bimbot Frédéric
Cohen Jérémy E.
Marmoret Axel
Publication venue
Publication date: 30/11/2023
Field of study

Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ``chorus'', ``verse'', ``solo'', etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a dynamic programming algorithm that segments self-similarity matrices, which are a standard description used in MSA and in numerous other applications. In this work, self-similarity matrices are computed from the feature representation of an audio signal and time is sampled at the bar-scale. This study examines three different standard similarity functions for the computation of self-similarity matrices. Results show that, in optimal conditions, the proposed algorithm achieves a level of performance which is competitive with supervised state-of-the-art methods while only requiring knowledge of bar positions. In addition, the algorithm is made open-source and is highly customizable.Comment: 19 pages, 13 figures, 11 tables, 1 algorithm, published in Transactions of the International Society for Music Information Retrieva

arXiv.org e-Print Archive

Convolutive Block-Matching Segmentation Algorithm with Application to Music Structure Analysis

Author: Bimbot Frédéric
Cohen Jérémy E.
Marmoret Axel
Publication venue
Publication date: 27/10/2022
Field of study

Music Structure Analysis (MSA) consists of representing a song in sections (such as ``chorus'', ``verse'', ``solo'' etc), and can be seen as the retrieval of a simplified organization of the song. This work presents a new algorithm, called Convolutive Block-Matching (CBM) algorithm, devoted to MSA. In particular, the CBM algorithm is a dynamic programming algorithm, applying on autosimilarity matrices, a standard tool in MSA. In this work, autosimilarity matrices are computed from the feature representation of an audio signal, and time is sampled on the barscale. We study three different similarity functions for the computation of autosimilarity matrices. We report that the proposed algorithm achieves a level of performance competitive to that of supervised state-of-the-art methods on 3 among 4 metrics, while being fully unsupervised.Comment: 4 pages, 5 figures, 1 table. Submitted at ICASSP 2023. The associated toolbox is available at https://gitlab.inria.fr/amarmore/autosimilarity_segmentatio

arXiv.org e-Print Archive

Methodological and musicological investigation of the System & Contrast model for musical form description

Author: Bimbot Frédéric
Deruty Emmanuel
Van Wymeersch Brigitte
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

The semiotic description of music structure aims at representing the high-level organization of music pieces in a concise, generic and reproducible way as a low-rate stream of arbitrary symbols from a limited alphabet, which results into a sequence of " semiotic units ". In this context, the purpose of the System & Contrast model is to address the internal organization of the semiotic units. In this report, the System & Contrast model is approached from different angles in relation to varied disciplines : cognitive psychology, music analysis and information theory. After establishing a number of links between the System & Contrast model and other approaches of music structure, the model is illustrated on studio-based popular music pieces, as well as on music from the classical Viennese period

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Well-posedness of the permutation problem in sparse filter estimation with lp minimization

Author: Benichoux Alexis
Bimbot Frédéric
Gribonval Rémi
Sudhakar Prasad
Publication venue
Publication date: 10/11/2011
Field of study

Convolutive source separation is often done in two stages: 1) estimation of the mixing filters and 2) estimation of the sources. Traditional approaches suffer from the ambiguities of arbitrary permutations and scaling in each frequency bin of the estimated filters and/or the sources, and they are usually corrected by taking into account some special properties of the filters/sources. This paper focusses on the filter permutation problem in the absence of scaling, investigating the possible use of the temporal sparsity of the filters as a property enabling permutation correction. Theoretical and experimental results highlight the potential as well as the limits of sparsity as an hypothesis to obtain a well-posed permutation problem

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Likelihood ratio adjustment for the compensation of model mismatch in speaker verification

Author: Bimbot Frédéric
Genoud Dominique
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Cet article présente une méthode d'ajustement des seuils de vérification du locuteur basée sur un modèle Gaussien des distributions du logarithme du rapport de vraisemblance. L'article expose les hypothèses sous lesquelles ce modèle est valide, indique plusieurs méthodes d'ajustement des seuils, et en illustre les apports et les limites par des expériences de vérification sur une base de données de 20 locuteurs

Infoscience - École polytechnique fédérale de Lausanne

Adaptation robuste de modeles HMM pour la verification du locuteur dependante du texte

Author: Bimbot Frédéric
Mariéthoz Johnny
Publication venue: Aussois, France
Publication date: 10/03/2006
Field of study

When deploying a secure system based on speaker verification, the limited amount of training data is usually critical. Indeed, the enrollment procedure must be fast and user-friendly. An incremental training of HMM speaker models, based on a MAP (Maximum A Posteriori) adaptation technique is used in order to make the enrollment more robust with only one or two utterances of the client password. This paper presents the improvements which can be achieved, in term of verification performance and stability of the decision thresholds. Our results highlight the benefits of MAP adaptation in conjunction with a synchronous alignment approach

Infoscience - École polytechnique fédérale de Lausanne

Signal modeling with Non Uniform Topology lattice filters

Author: Bimbot Frédéric
Krstulović Sacha
Publication venue
Publication date: 10/03/2006
Field of study

This article presents a new class of constrained and specialized Auto-Regressive (AR) processes. They are derived from lattice filters where some reflection coefficients are forced to zero at a priori locations. Optimizing the filter topology allows to build parametric spectral models that have a greater number of poles than the number of parameters needed to describe their location. These NUT (Non-Uniform Topology) models are assessed by evaluating the reduction of modeling error with respect to conventional AR models

Infoscience - École polytechnique fédérale de Lausanne