246 research outputs found

    Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition

    Get PDF
    In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features

    Identification using face regions: Application and assessment in forensic scenarios

    Full text link
    This is the author’s version of a work that was accepted for publication in Forensic Science International. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Forensic Science International, 23, 1-3, (2013) DOI: 10.1016/j.forsciint.2013.08.020This paper reports an exhaustive analysis of the discriminative power of the different regions of the human face on various forensic scenarios. In practice, when forensic examiners compare two face images, they focus their attention not only on the overall similarity of the two faces. They carry out an exhaustive morphological comparison region by region (e.g., nose, mouth, eyebrows, etc.). In this scenario it is very important to know based on scientific methods to what extent each facial region can help in identifying a person. This knowledge obtained using quantitative and statical methods on given populations can then be used by the examiner to support or tune his observations. In order to generate such scientific knowledge useful for the expert, several methodologies are compared, such as manual and automatic facial landmarks extraction, different facial regions extractors, and various distances between the subject and the acquisition camera. Also, three scenarios of interest for forensics are considered comparing mugshot and Closed-Circuit TeleVision (CCTV) face images using MORPH and SCface databases. One of the findings is that depending of the acquisition distances, the discriminative power of the facial regions change, having in some cases better performance than the full face

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    Reconnaissance automatique du locuteur par des GMM à grande marge

    Get PDF
    Depuis plusieurs dizaines d'années, la reconnaissance automatique du locuteur (RAL) fait l'objet de travaux de recherche entrepris par de nombreuses équipes dans le monde. La majorité des systèmes actuels sont basés sur l'utilisation des Modèles de Mélange de lois Gaussiennes (GMM) et/ou des modèles discriminants SVM, i.e., les machines à vecteurs de support. Nos travaux ont pour objectif général la proposition d'utiliser de nouveaux modèles GMM à grande marge pour la RAL qui soient une alternative aux modèles GMM génératifs classiques et à l'approche discriminante état de l'art GMM-SVM. Nous appelons ces modèles LM-dGMM pour Large Margin diagonal GMM. Nos modèles reposent sur une récente technique discriminante pour la séparation multi-classes, qui a été appliquée en reconnaissance de la parole. Exploitant les propriétés des systèmes GMM utilisés en RAL, nous présentons dans cette thèse des variantes d'algorithmes d'apprentissage discriminant des GMM minimisant une fonction de perte à grande marge. Des tests effectués sur les tâches de reconnaissance du locuteur de la campagne d'évaluation NIST-SRE 2006 démontrent l'intérêt de ces modèles en reconnaissance.Most of state-of-the-art speaker recognition systems are based on Gaussian Mixture Models (GMM), trained using maximum likelihood estimation and maximum a posteriori (MAP) estimation. The generative training of the GMM does not however directly optimize the classification performance. For this reason, discriminative models, e.g., Support Vector Machines (SVM), have been an interesting alternative since they address directly the classification problem, and they lead to good performances. Recently a new discriminative approach for multiway classification has been proposed, the Large Margin Gaussian mixture models (LM-GMM). As in SVM, the parameters of LM-GMM are trained by solving a convex optimization problem. However they differ from SVM by using ellipsoids to model the classes directly in the input space, instead of half-spaces in an extended high-dimensional space. While LM-GMM have been used in speech recognition, they have not been used in speaker recognition (to the best of our knowledge). In this thesis, we propose simplified, fast and more efficient versions of LM-GMM which exploit the properties and characteristics of speaker recognition applications and systems, the LM-dGMM models. In our LM-dGMM modeling, each class is initially modeled by a GMM trained by MAP adaptation of a Universal Background Model (UBM) or directly initialized by the UBM. The models mean vectors are then re-estimated under some Large Margin constraints. We carried out experiments on full speaker recognition tasks under the NIST-SRE 2006 core condition. The experimental results are very satisfactory and show that our Large Margin modeling approach is very promising

    Development and demonstration of an on-board mission planner for helicopters

    Get PDF
    Mission management tasks can be distributed within a planning hierarchy, where each level of the hierarchy addresses a scope of action, and associated time scale or planning horizon, and requirements for plan generation response time. The current work is focused on the far-field planning subproblem, with a scope and planning horizon encompassing the entire mission and with a response time required to be about two minutes. The far-feld planning problem is posed as a constrained optimization problem and algorithms and structural organizations are proposed for the solution. Algorithms are implemented in a developmental environment, and performance is assessed with respect to optimality and feasibility for the intended application and in comparison with alternative algorithms. This is done for the three major components of far-field planning: goal planning, waypoint path planning, and timeline management. It appears feasible to meet performance requirements on a 10 Mips flyable processor (dedicated to far-field planning) using a heuristically-guided simulated annealing technique for the goal planner, a modified A* search for the waypoint path planner, and a speed scheduling technique developed for this project

    Navigating the Stars: Norway, the European Economic Area and the European Union. CEPS Paperback. February 2002

    Get PDF
    This study expertly assesses the evolving relationship between Norway and the European Union, the centrepiece of which is the European Economic Area (EEA). Faced with an increasingly outdated network of relationships with the EU, Norway finds itself marginalised from policy-making and subject instead to policy-taking. This report evaluates Norway’s position in relation to the ‘future of Europe’ debate as well as a range of hypothetical options that Norway may contemplate, focusing on several key policy areas including the single market, the macroeconomic agenda, justice and home affairs, and foreign security and defence policies

    Biometrics

    Get PDF
    Biometrics uses methods for unique recognition of humans based upon one or more intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. The book consists of 13 chapters, each focusing on a certain aspect of the problem. The book chapters are divided into three sections: physical biometrics, behavioral biometrics and medical biometrics. The key objective of the book is to provide comprehensive reference and text on human authentication and people identity verification from both physiological, behavioural and other points of view. It aims to publish new insights into current innovations in computer systems and technology for biometrics development and its applications. The book was reviewed by the editor Dr. Jucheng Yang, and many of the guest editors, such as Dr. Girija Chetty, Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park, Dr. Sook Yoon and so on, who also made a significant contribution to the book

    Exploring variabilities through factor analysis in automatic acoustic language recognition

    Get PDF
    La problématique traitée par la Reconnaissance de la Langue (LR) porte sur la définition découverte de la langue contenue dans un segment de parole. Cette thèse se base sur des paramètres acoustiques de courte durée, utilisés dans une approche d adaptation de mélanges de Gaussiennes (GMM-UBM). Le problème majeur de nombreuses applications du vaste domaine de la re- problème connaissance de formes consiste en la variabilité des données observées. Dans le contexte de la Reconnaissance de la Langue (LR), cette variabilité nuisible est due à des causes diverses, notamment les caractéristiques du locuteur, l évolution de la parole et de la voix, ainsi que les canaux d acquisition et de transmission. Dans le contexte de la reconnaissance du locuteur, l impact de la variabilité solution peut sensiblement être réduit par la technique d Analyse Factorielle (Joint Factor Analysis, JFA). Dans ce travail, nous introduisons ce paradigme à la Reconnaissance de la Langue. Le succès de la JFA repose sur plusieurs hypothèses. La première est que l information observée est décomposable en une partie universelle, une partie dépendante de la langue et une partie de variabilité, qui elle est indépendante de la langue. La deuxième hypothèse, plus technique, est que la variabilité nuisible se situe dans un sous-espace de faible dimension, qui est défini de manière globale.Dans ce travail, nous analysons le comportement de la JFA dans le contexte d un dispositif de LR du type GMM-UBM. Nous introduisons et analysons également sa combinaison avec des Machines à Vecteurs Support (SVM). Les premières publications sur la JFA regroupaient toute information qui est amélioration nuisible à la tâche (donc ladite variabilité) dans un seul composant. Celui-ci est supposé suivre une distribution Gaussienne. Cette approche permet de traiter les différentes sortes de variabilités d une manière unique. En pratique, nous observons que cette hypothèse n est pas toujours vérifiée. Nous avons, par exemple, le cas où les données peuvent être groupées de manière logique en deux sous-parties clairement distinctes, notamment en données de sources téléphoniques et d émissions radio. Dans ce cas-ci, nos recherches détaillées montrent un certain avantage à traiter les deux types de données par deux systèmes spécifiques et d élire comme score de sortie celui du système qui correspond à la catégorie source du segment testé. Afin de sélectionner le score de l un des systèmes, nous avons besoin d un analyses détecteur de canal source. Nous proposons ici différents nouveaux designs pour engendrées de tels détecteurs automatiques. Dans ce cadre, nous montrons que les facteurs de variabilité (du sous-espace) de la JFA peuvent être utilisés avec succès pour la détection de la source. Ceci ouvre la perspective intéressante de subdiviser les5données en catégories de canal source qui sont établies de manière automatique. En plus de pouvoir s adapter à des nouvelles conditions de source, cette propriété permettrait de pouvoir travailler avec des données d entraînement qui ne sont pas accompagnées d étiquettes sur le canal de source. L approche JFA permet une réduction de la mesure de coûts allant jusqu à généraux 72% relatives, comparé au système GMM-UBM de base. En utilisant des systèmes spécifiques à la source, suivis d un sélecteur de scores, nous obtenons une amélioration relative de 81%.Language Recognition is the problem of discovering the language of a spoken definitionutterance. This thesis achieves this goal by using short term acoustic information within a GMM-UBM approach.The main problem of many pattern recognition applications is the variability of problemthe observed data. In the context of Language Recognition (LR), this troublesomevariability is due to the speaker characteristics, speech evolution, acquisition and transmission channels.In the context of Speaker Recognition, the variability problem is solved by solutionthe Joint Factor Analysis (JFA) technique. Here, we introduce this paradigm toLanguage Recognition. The success of JFA relies on several assumptions: The globalJFA assumption is that the observed information can be decomposed into a universalglobal part, a language-dependent part and the language-independent variabilitypart. The second, more technical assumption consists in the unwanted variability part to be thought to live in a low-dimensional, globally defined subspace. In this work, we analyze how JFA behaves in the context of a GMM-UBM LR framework. We also introduce and analyze its combination with Support Vector Machines(SVMs).The first JFA publications put all unwanted information (hence the variability) improvemen tinto one and the same component, which is thought to follow a Gaussian distribution.This handles diverse kinds of variability in a unique manner. But in practice,we observe that this hypothesis is not always verified. We have for example thecase, where the data can be divided into two clearly separate subsets, namely datafrom telephony and from broadcast sources. In this case, our detailed investigations show that there is some benefit of handling the two kinds of data with two separatesystems and then to elect the output score of the system, which corresponds to the source of the testing utterance.For selecting the score of one or the other system, we need a channel source related analyses detector. We propose here different novel designs for such automatic detectors.In this framework, we show that JFA s variability factors (of the subspace) can beused with success for detecting the source. This opens the interesting perspectiveof partitioning the data into automatically determined channel source categories,avoiding the need of source-labeled training data, which is not always available.The JFA approach results in up to 72% relative cost reduction, compared to the overall resultsGMM-UBM baseline system. Using source specific systems followed by a scoreselector, we achieve 81% relative improvement.AVIGNON-Bib. numérique (840079901) / SudocSudocFranceF
    corecore