42 research outputs found
Robust language recognition via adaptive language factor extraction
This paper presents a technique to adapt an acoustically based
language classifier to the background conditions and speaker
accents. This adaptation improves language classification on
a broad spectrum of TV broadcasts. The core of the system
consists of an iVector-based setup in which language and channel
variabilities are modeled separately. The subsequent language
classifier (the backend) operates on the language factors,
i.e. those features in the extracted iVectors that explain the observed
language variability. The proposed technique adapts the
language variability model to the background conditions and
to the speaker accents present in the audio. The effect of the
adaptation is evaluated on a 28 hours corpus composed of documentaries and monolingual as well as multilingual broadcast
news shows. Consistent improvements in the automatic identification
of Flemish (Belgian Dutch), English and French are demonstrated for all broadcast types
Factor analysis for speaker segmentation and improved speaker diarization
Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative
Adaptive speaker diarization of broadcast news based on factor analysis
The introduction of factor analysis techniques in a speaker diarization system enhances its performance by facilitating the use of speaker specific information, by improving the suppression of nuisance factors such as phonetic content, and by facilitating various forms of adaptation. This paper describes a state-of-the-art iVector-based diarization system which employs factor analysis and adaptation on all levels. The diarization modules relevant for this work are: the speaker segmentation which searches for speaker boundaries and the speaker clustering which aims at grouping speech segments of the same speaker. The speaker segmentation relies on speaker factors which are extracted on a frame-by-frame basis using eigenvoices. We incorporate soft voice activity detection in this extraction process as the speaker change detection should be based on speaker information only and we want it to disregard the non-speech frames by applying speech posteriors. Potential speaker boundaries are inserted at positions where rapid changes in speaker factors are witnessed. By employing Mahalanobis distances, the effect of the phonetic content can be further reduced, which results in more accurate speaker boundaries. This iVector-based segmentation significantly outperforms more common segmentation methods based on the Bayesian Information Criterion (BIC) or speech activity marks. The speaker clustering employs two-step Agglomerative Hierarchical Clustering (AHC): after initial BIC clustering, the second cluster stage is realized by either an iVector Probabilistic Linear Discriminant Analysis (PLDA) system or Cosine Distance Scoring (CDS) of extracted speaker factors. The segmentation system is made adaptive on a file-by-file basis by iterating the diarization process using eigenvoice matrices adapted (unsupervised) on the output of the previous iteration. Assuming that for most use cases material similar to the recording in question is readily available, unsupervised domain adaptation of the speaker clustering is possible as well. We obtain this by expanding the eigenvoice matrix used during speaker factor extraction for the CDS clustering stage with a small set of new eigenvoices that, in combination with the initial generic eigenvoices, models the recurring speakers and acoustic conditions more accurately. Experiments on the COST278 multilingual broadcast news database show the generation of significantly more accurate speaker boundaries by using adaptive speaker segmentation which also results in more accurate clustering. The obtained speaker error rate (SER) can be further reduced by another 13% relative to 7.4% via domain adaptation of the CDS clustering. (C) 2017 Elsevier Ltd. All rights reserved
Model-based speech/non-speech segmentation of a heterogeneous multilingual TV broadcast collection
Multimedia Information Retrieval systems normally comprise a preprocessor that performs a speech/non-speech (SNS)
segmentation of the audio stream. The goal of such a segmentation is to divide the audio into intervals that need a lexical transcription and intervals that just need some categorization in terms of jingle, applause, etc. In this paper a baseline SNS system that was trained on monolingual BN data is evaluated on a multilingual BN corpus and on a heterogeneous corpus, composed of diverse TV shows including discussions, soaps, animation films, etc. It appears that the system exhibits serious deficiencies when confronted with such out-of-domain data. Especially the heterogeneous corpus, characterized by many short speaker turns and a rich pallet of non-speech intervals, turns out to be challenging. However, employing a proper SNS information criterion, it is demonstrated that enhancing the acoustic representation of the audio, creating a richer music model and performing a file-wise adaptation of the acoustic models can significantly increase the performance. Complex architectures permitting explicit duration modeling and re-segmentation of the speech parts after speaker change detection on the other hand do not seem to help
Sur l’insuffisance des théories de la relaxation dans l’interprétation des spectres hertziens et ultrahertziens des liquides polaire
Les spectres hertziens, ultrahertziens et infrarouge lointain (fréquences comprises entre 1 mégahertz et 6 000 gigahertz) de nombreux liquides polaires simples, purs et en solution dans des solvants non polaires, sont interprétés à partir de la fonction d’autocorrélation Ф(t) du moment dipolaire telle que l’ont définie GLARUM et COLE. Nous montrons que l’expression généralement admise pour Ф(t) (théorie de la relaxation) n’est pas entièrement satisfaisante d’un point de vue théorique et qu’elle ne permet pas de décrire les résultats expérimentaux pour les longueurs d’onde inférieures au millimètre. En nous basant sur les travaux de STEELE, COLE et Van VLECK une nouvelle expression de Ф(t) est alors établie qui fait apparaître un nouveau paramètre : le temps de corrélation de la vitesse angulaire du moment dipolaire.Les profils spectraux théoriques ainsi obtenus sont comparés avec nos résultats expérimentaux et l’on constate un accord satisfaisant
N° 10 — Pierre DESPLANQUES
Desplanques Pierre, Bernard Danièle, Lepagnot-Leca Françoise. N° 10 — Pierre DESPLANQUES. In: Témoins et acteurs des politiques de l'éducation depuis la Libération. Tome 3 - Inventaire de quarante entretiens. Paris : Institut national de recherche pédagogique, 2002. pp. 35-38. (Témoins et acteurs des politiques de l'éducation, 1
N° 10 — Pierre DESPLANQUES
Desplanques Pierre, Bernard Danièle, Lepagnot-Leca Françoise. N° 10 — Pierre DESPLANQUES. In: Témoins et acteurs des politiques de l'éducation depuis la Libération. Tome 3 - Inventaire de quarante entretiens. Paris : Institut national de recherche pédagogique, 2002. pp. 35-38. (Témoins et acteurs des politiques de l'éducation, 1