12 research outputs found
Speech and Speaker Recognition for Home Automation: Preliminary Results
International audienceIn voice controlled multi-room smart homes ASR and speaker identification systems face distance speech conditionswhich have a significant impact on performance. Regarding voice command recognition, this paper presents an approach whichselects dynamically the best channel and adapts models to the environmental conditions. The method has been tested on datarecorded with 11 elderly and visually impaired participants in a real smart home. The voice command recognition error ratewas 3.2% in off-line condition and of 13.2% in online condition. For speaker identification, the performances were below veryspeaker dependant. However, we show a high correlation between performance and training size. The main difficulty was the tooshort utterance duration in comparison to state of the art studies. Moreover, speaker identification performance depends on the sizeof the adapting corpus and then users must record enough data before using the system
Fiabilité de la comparaison des voix dans le cadre judiciaire
It is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyse both suspect and criminal’s voice samples in order to indicate whether the evidence supports the prosecution (same-speaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the new “golden standard” in forensic sciences. The LR not only supports one of the hypotheses but also quantifies the strength of its support. However, the LR accepts some practical limitations due to its estimation process itself. It is particularly true when Automatic Speaker Recognition (ASpR) systems are considered as they are outputting a score in all situations regardless of the case specific conditions. Indeed, several factors are not taken into account by the estimation process like the quality and quantity of information in both voice recordings, their phonological content or also the speakers intrinsic characteristics, etc. All these factors put into question the validity and reliability of FVC. In this Thesis, we wish to address these issues. First, we propose to analyse how the phonetic content of a pair of voice recordings affects the FVC accuracy. We show that oral vowels, nasal vowels and nasal consonants bring more speaker-specific information than averaged phonemic content. In contrast, plosive, liquid and fricative do not have a significant impact on the LR accuracy. This investigation demonstrates the importance of the phonemic content and highlights interesting differences between inter-speakers effects and intra-speaker’s ones. A further study is performed in order to study the individual speaker-specific information for each vowel based on formant parameters without any use of ASpR system. This study has revealed interesting differences between vowels in terms of quantity of speaker information. The results show clearly the importance of intra-speaker variability effects in FVC reliability estimation. Second, we investigate an approach to predict the LR reliability based only on the pair of voice recordings. We define a homogeneity criterion (NHM) able to measure the presence of relevant information and the homogeneity of this information between the pair of voice recordings. We are expecting that lowest values of homogeneity are correlated with the lowest LR’s accuracy measures, as well as the opposite behaviour for high values. The results showed the interest of the homogeneity measure for FVC reliability. Our studies reported also large differences of behaviour between FVC genuine and impostor trials. The results confirmed the importance of intra-speaker variability effects in FVC reliability estimation. The main takeaway of this Thesis is that averaging the system behaviour over a high number of factors (speaker, duration, content...) hides potentially many important details. For a better understanding of FVC approach and/or an ASpR system, it is mandatory to explore the behaviour of the system at an as-detailed-as-possible scale (The devil lies in the details).Dans les procédures judiciaires, des enregistrements de voix sont de plus en plus fréquemment présentés comme élément de preuve. En général, il est fait appel à un expert scientifique pour établir si l’extrait de voix en question a été prononcé par un suspect donné (prosecution hypothesis) ou non (defence hypothesis). Ce prosessus est connu sous le nom de “Forensic Voice Comparison (FVC)” (comparaison de voix dans le cadre judiciaire). Depuis l’émergence du modèle DNA typing, l’approche Bayesienne est devenue le nouveau “golden standard” en sciences criminalistiques. Dans cette approche, l’expert exprime le résultat de son analyse sous la forme d’un rapport de vraisemblance (LR). Ce rapport ne favorise pas seulement une des hypothèses (“prosecution” ou “defence”) mais il fournit également le poids de cette décision. Bien que le LR soit théoriquement suffisant pour synthétiser le résultat, il est dans la pratique assujetti à certaines limitations en raison de son processus d’estimation. Cela est particulièrement vrai lorsque des systèmes de reconnaissance automatique du locuteur (ASpR) sont utilisés. Ces systèmes produisent un score dans toutes les situations sans prendre en compte les conditions spécifiques au cas étudié. Plusieurs facteurs sont presque toujours ignorés par le processus d’estimation tels que la qualité et la quantité d’information dans les deux enregistrements vocaux, la cohérence de l’information entre les deux enregistrements, leurs contenus phonétiques ou encore les caractéristiques intrinsèques des locuteurs. Tous ces facteurs mettent en question la notion de fiabilité de la comparaison de voix dans le cadre judiciaire. Dans cette thèse, nous voulons adresser cette problématique dans le cadre des systèmes automatiques (ASpR) sur deux points principaux. Le premier consiste à établir une échelle hiérarchique des catégories phonétiques des sons de parole selon la quantité d’information spécifique au locuteur qu’ils contiennent. Cette étude montre l’importance du contenu phonétique: Elle met en évidence des différences intéressantes entre les phonèmes et la forte influence de la variabilité intra-locuteurs. Ces résultats ont été confirmés par une étude complémentaire sur les voyelles orales basée sur les paramètres formantiques, indépendamment de tout système de reconnaissance du locuteur. Le deuxième point consiste à mettre en œuvre une approche afin de prédire la fiabilité du LR à partir des deux enregistrements d’une comparaison de voix sans recours à un ASpR. À cette fin, nous avons défini une mesure d’homogénéité (NHM) capable d’estimer la quantité d’information et l’homogénéité de cette information entre les deux enregistrements considérés. Notre hypothèse ainsi définie est que l’homogénéité soit directement corrélée avec le degré de fiabilité du LR. Les résultats obtenus ont confirmé cette hypothèse avec une mesure NHM fortement corrélée à la mesure de fiabilité du LR. Nos travaux ont également mis en évidence des différences significatives du comportement de NHM entre les comparaisons cibles et les comparaisons imposteurs. Nos travaux ont montré que l’approche “force brute” (reposant sur un grand nombre de comparaisons) ne suffit pas à assurer une bonne évaluation de la fiabilité en FVC. En effet, certains facteurs de variabilité peuvent induire des comportements locaux des systèmes, liés à des situations particulières. Pour une meilleure compréhension de l’approche FVC et/ou d’un système ASpR, il est nécessaire d’explorer le comportement du système à une échelle aussi détaillée que possible (le diable se cache dans les détails
Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons
International audienceIt is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyze both suspect and criminal's voice samples in order to indicate whether the evidence supports the prosecution (same-speaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the golden standard in forensic sciences. The LR not only supports one of the hypotheses but also quantifies the strength of its support. However, the LR accepts some practical limitations due to its estimation process itself. It is particularly true when Automatic Speaker Recognition (ASpR) systems are considered as they are outputting a score in all situations regardless of the case specific conditions. Indeed, several factors are not taken into account by the estimation process like the quality and quantity of information in both voice recordings , their phonological content or also the speakers intrinsic characteristics. In our recent study, we showed the importance of the phonemic content and we highlighted interesting differences between inter-speakers effects and intra-speaker's ones. In this article, we wish to take our previous analysis a step farther and investigate the impact of rhythm variation separately on target and non-target trials
Suivre le rythme de tes paroles
International audienceFollowing the rhythm of your speech Various temporal measures, such as the duration of vowels and consonants, have been proposed to characterize the rhythm of speech and thus classify languages, dialects or idiotic expressions. It is on this last role of the temporal parameters of speech that this study focuses on, using the FABIOLE database. Used for voice comparison, it is constructed from media broadcasts (TV and radio). It allows us to study the variability of certain temporal parameters, within and between speakers, in search of idiosyncrasy. The results show that the percentage of variability that can be attributed to the speaker is 45% for the variance of the duration of un-voiced segments, 42% for the total percentage of voiced segments. Thus, these temporal measurements depend on the speaker, much more strongly than the formantic parameters.Différentes mesures temporelles, telles que la durée des voyelles et des consonnes, ont été proposées pour tenter de caractériser le rythme de la parole et classer ainsi les langues, les dialectes ou les idiolectes. C'est sur ce dernier rôle des paramètres temporels de la parole que cette étude se focalise en s'appuyant sur la base de données FABIOLE. Utilisée pour la comparaison de voix, elle est construite à partir d'émissions médiatiques (TV et radio). Elle nous permet ainsi d'étudier la variabilité de certains paramètres temporels, variabilité intra et inter locuteurs, à la recherche d'un idiolecte. Les résultats montrent que la part de variabilité que l'on peut attribuer au locuteur atteint 45% pour la variance de la durée des segments non voisés, 42% pour le pourcentage total de segments voisés. Ainsi, ces mesures temporelles dépendent du locuteur, de façon bien plus marquée que ne le sont les paramètres formantique
Impact of rhythm on forensic voice comparison reliability
International audienceIt is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyze both suspect and criminals voice samples in order to indicate whether the evidence supports the prosecution (same-speaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the new golden standard in forensic sciences. The LR not only supports one of the hypotheses but also quantifies the strength of its support. However, the LR accepts some practical limitations due to its estimation process itself. It is particularly true when Automatic Speaker Recognition (ASpR) systems are considered as they are outputting a score in all situations regardless of the case specific conditions. Indeed, several factors are not taken into account by the estimation process like the quality and quantity of information in both voice recordings, their phonological content or also the speakers intrinsic characteristics, etc. All these factors put into question the validity and reliability of FVC. In our recent study, we showed that intra-speaker variability explains 2/3 of the system losses. In this article, we investigate the relations between intra-speaker variability and rhythmic parameters
Impact of rhythm on forensic voice comparison reliability
International audienceIt is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyze both suspect and criminals voice samples in order to indicate whether the evidence supports the prosecution (same-speaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the new golden standard in forensic sciences. The LR not only supports one of the hypotheses but also quantifies the strength of its support. However, the LR accepts some practical limitations due to its estimation process itself. It is particularly true when Automatic Speaker Recognition (ASpR) systems are considered as they are outputting a score in all situations regardless of the case specific conditions. Indeed, several factors are not taken into account by the estimation process like the quality and quantity of information in both voice recordings, their phonological content or also the speakers intrinsic characteristics, etc. All these factors put into question the validity and reliability of FVC. In our recent study, we showed that intra-speaker variability explains 2/3 of the system losses. In this article, we investigate the relations between intra-speaker variability and rhythmic parameters
A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space
International audienc
An information theory based data-homogeneity measure for voice comparison
International audienc
Comparaison des voix dans le cadre judiciaire : influence du contenu phonétique
International audienceEn comparaison de voix dans le domaine criminalistique, l'approche Bayésienne est devenue le nouveau "golden standard". Dans cette approche, l'expert exprime ses résultats par un unique nombre, le rapport de vraisemblance (LR). Cet article s'intéresse à l'influence du contenu phonétique sur la fiabilité du LR. Nous nous intéressons particulièrement à la quantité d'information spécifique au locuteur que portent les différents sons de la parole. Cette étude met en évidence des différences importantes entre les phonèmes et, surtout, la forte influence de la variabilité intra-locuteur. ABSTRACT phonetic content impact on forensic voice comparison. Forensic Voice Comparison (FVC) is increasingly using the likelihood ratio (LR).This article focuses on the impact of phonemic content on FVC performance and variability. The results demonstrate the importance of the phonemic content and highlight interesting differences between inter-speakers effects and intra-speaker's ones. MOTS-CLÉS : Reconnaissance du locuteur, comparaison de voix, crimnalistique, fiabilité, contenu phonémique.
Phonetic content impact on Forensic Voice Comparison
International audienc