117 research outputs found

    Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

    Get PDF
    There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV). However, a moderate success has been achieved. A recent study [1] presented a time contrastive learning (TCL) concept to explore the non-stationarity of brain signals for classification of brain states. Speech signals have similar non-stationarity property, and TCL further has the advantage of having no need for labeled data. We therefore present a TCL based BN feature extraction method. The method uniformly partitions each speech utterance in a training dataset into a predefined number of multi-frame segments. Each segment in an utterance corresponds to one class, and class labels are shared across utterances. DNNs are then trained to discriminate all speech frames among the classes to exploit the temporal structure of speech. In addition, we propose a segment-based unsupervised clustering algorithm to re-assign class labels to the segments. TD-SV experiments were conducted on the RedDots challenge database. The TCL-DNNs were trained using speech data of fixed pass-phrases that were excluded from the TD-SV evaluation set, so the learned features can be considered phrase-independent. We compare the performance of the proposed TCL bottleneck (BN) feature with those of short-time cepstral features and BN features extracted from DNNs discriminating speakers, pass-phrases, speaker+pass-phrase, as well as monophones whose labels and boundaries are generated by three different automatic speech recognition (ASR) systems. Experimental results show that the proposed TCL-BN outperforms cepstral features and speaker+pass-phrase discriminant BN features, and its performance is on par with those of ASR derived BN features. Moreover,....Comment: Copyright (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Text-Independent Speaker Identification using Statistical Learning

    Get PDF
    The proliferation of voice-activated devices and systems and over-the-phone bank transactions has made our daily affairs much easier in recent times. The ease that these systems offer also call for a need for them to be fail-safe against impersonators. Due to the sensitive information that might be shred on these systems, it is imperative that security be an utmost concern during the development stages. Vital systems like these should incorporate a functionality of discriminating between the actual speaker and impersonators. That functionality is the focus of this thesis. Several methods have been proposed to be used to achieve this system and some success has been recorded so far. However, due to the vital role this system has to play in securing critical information, efforts have been continually made to reduce the probability of error in the systems. Therefore, statistical learning methods or techniques are utilized in this thesis because they have proven to have high accuracy and efficiency in various other applications. The statistical methods used are Gaussian Mixture Models and Support Vector Machines. These methods have become the de facto techniques for designing speaker identification systems. The effectiveness of the support vector machine is dependent on the type of kernel used. Several kernels have been proposed for achieving better results and we also introduce a kernel in this thesis which will serve as an alternative to the already defined ones. Other factors including the number of components used in modeling the Gaussian Mixture Model (GMM) affect the performance of the system and these factors are used in this thesis and exciting results were obtained

    Improving speaker recognition by biometric voice deconstruction

    Get PDF
    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions

    Multi-media personal identity verification

    Get PDF

    Métodos discriminativos para la optimización de modelos en la Verificación del Hablante

    Get PDF
    La creciente necesidad de sistemas de autenticación seguros ha motivado el interés de algoritmos efectivos de Verificación de Hablante (VH). Dicha necesidad de algoritmos de alto rendimiento, capaces de obtener tasas de error bajas, ha abierto varias ramas de investigación. En este trabajo proponemos investigar, desde un punto de vista discriminativo, un conjunto de metodologías para mejorar el desempeño del estado del arte de los sistemas de VH. En un primer enfoque investigamos la optimización de los hiper-parámetros para explícitamente considerar el compromiso entre los errores de falsa aceptación y falso rechazo. El objetivo de la optimización se puede lograr maximizando el área bajo la curva conocida como ROC (Receiver Operating Characteristic) por sus siglas en inglés. Creemos que esta optimización de los parámetros no debe de estar limitada solo a un punto de operación y una estrategia más robusta es optimizar los parámetros para incrementar el área bajo la curva, AUC (Area Under the Curve por sus siglas en inglés) de modo que todos los puntos sean maximizados. Estudiaremos cómo optimizar los parámetros utilizando la representación matemática del área bajo la curva ROC basada en la estadística de Wilcoxon Mann Whitney (WMW) y el cálculo adecuado empleando el algoritmo de descendente probabilístico generalizado. Además, analizamos el efecto y mejoras en métricas como la curva detection error tradeoff (DET), el error conocido como Equal Error Rate (EER) y el valor mínimo de la función de detección de costo, minimum value of the detection cost function (minDCF) todos ellos por sue siglas en inglés. En un segundo enfoque, investigamos la señal de voz como una combinación de atributos que contienen información del hablante, del canal y el ruido. Los sistemas de verificación convencionales entrenan modelos únicos genéricos para todos los casos, y manejan las variaciones de estos atributos ya sea usando análisis de factores o no considerando esas variaciones de manera explícita. Proponemos una nueva metodología para particionar el espacio de los datos de acuerdo a estas carcterísticas y entrenar modelos por separado para cada partición. Las particiones se pueden obtener de acuerdo a cada atributo. En esta investigación mostraremos como entrenar efectivamente los modelos de manera discriminativa para maximizar la separación entre ellos. Además, el diseño de algoritimos robustos a las condiciones de ruido juegan un papel clave que permite a los sistemas de VH operar en condiciones reales. Proponemos extender nuestras metodologías para mitigar los efectos del ruido en esas condiciones. Para nuestro primer enfoque, en una situación donde el ruido se encuentre presente, el punto de operación puede no ser solo un punto, o puede existir un corrimiento de forma impredecible. Mostraremos como nuestra metodología de maximización del área bajo la curva ROC es más robusta que la usada por clasificadores convencionales incluso cuando el ruido no está explícitamente considerado. Además, podemos encontrar ruido a diferentes relación señal a ruido (SNR) que puede degradar el desempeño del sistema. Así, es factible considerar una descomposición eficiente de las señales de voz que tome en cuenta los diferentes atributos como son SNR, el ruido y el tipo de canal. Consideramos que en lugar de abordar el problema con un modelo unificado, una descomposición en particiones del espacio de características basado en atributos especiales puede proporcionar mejores resultados. Esos atributos pueden representar diferentes canales y condiciones de ruido. Hemos analizado el potencial de estas metodologías que permiten mejorar el desempeño del estado del arte de los sistemas reduciendo el error, y por otra parte controlar los puntos de operación y mitigar los efectos del ruido

    Security of multimodal biometric systems against spoof attacks

    Get PDF
    A biometric system is essentially a pattern recognition system being used in ad-versarial environment. Since, biometric system like any conventional security system is exposed to malicious adversaries, who can manipulate data to make the system ineffective by compromising its integrity. Current theory and de- sign methods of biometric systems do not take into account the vulnerability to such adversary attacks. Therefore, evaluation of classical design methods is an open problem to investigate whether they lead to design secure systems. In order to make biometric systems secure it is necessary to understand and evalu-ate the threats and to thus develop effective countermeasures and robust system designs, both technical and procedural, if necessary. Accordingly, the extension of theory and design methods of biometric systems is mandatory to safeguard the security and reliability of biometric systems in adversarial environments. In this thesis, we provide some contributions towards this direction. Among all the potential attacks discussed in the literature, spoof attacks are one of the main threats against the security of biometric systems for identity recognition. Multimodal biometric systems are commonly believed to be in-trinsically more robust to spoof attacks than systems based on a single biomet-ric trait, as they combine information coming from different biometric traits. However, recent works have question such belief and shown that multimodal systems can be misled by an attacker (impostor) even by spoofing only one of the biometric traits. Therefore, we first provide a detailed review of state-of-the-art works in multimodal biometric systems against spoof attacks. The scope ofstate-of-the-art results is very limited, since they were obtained under a very restrictive “worst-case” hypothesis, where the attacker is assumed to be able to fabricate a perfect replica of a biometric trait whose matching score distribu-tion is identical to the one of genuine traits. Thus, we argue and investigate the validity of “worst-case” hypothesis using large set of real spoof attacks and provide empirical evidence that “worst-case” scenario can not be representa- ixtive of real spoof attacks: its suitability may depend on the specific biometric trait, the matching algorithm, and the techniques used to counterfeit the spoofed traits. Then, we propose a security evaluation methodology of biometric systems against spoof attacks that can be used in real applications, as it does not require fabricating fake biometric traits, it allows the designer to take into account the different possible qualities of fake traits used by different attackers, and it exploits only information on genuine and impostor samples which is col- lected for the training of a biometric system. Our methodology evaluates the performances under a simulated spoof attack using model of the fake score distribution that takes into account explicitly different degrees of the quality of fake biometric traits. In particular, we propose two models of the match score distribution of fake traits that take into account all different factors which can affect the match score distribution of fake traits like the particular spoofed biometric, the sensor, the algorithm for matching score computation, the technique used to construct fake biometrics, and the skills of the attacker. All these factors are summarized in a single parameter, that we call “attack strength”. Further, we propose extension of our security evaluation method to rank several biometric score fusion rules according to their relative robustness against spoof attacks. This method allows the designer to choose the most robust rule according to the method prediction. We then present empirical analysis, using data sets of face and fingerprints including real spoofed traits, to show that our proposed models provide a good approximation of fake traits’ score distribution and our method thus providing an adequate estimation of the security1 of biometric systems against spoof attacks. We also use our method to show how to evaluate the security of different multimodal systems on publicly available benchmark data sets without spoof attacks. Our experimental results show that robustness of multimodal biometric systems to spoof attacks strongly depends on the particular matching algorithm, the score fusion rule, and the attack strength of fake traits. We eventually present evidence, considering a multimodal system based on face and fingerprint biometrics, that the proposed methodology to rank score fusion rules is capable of providing correct ranking of score fusion rules under spoof attacks

    Analyzing and Applying Cryptographic Mechanisms to Protect Privacy in Applications

    Get PDF
    Privacy-Enhancing Technologies (PETs) emerged as a technology-based response to the increased collection and storage of data as well as the associated threats to individuals' privacy in modern applications. They rely on a variety of cryptographic mechanisms that allow to perform some computation without directly obtaining knowledge of plaintext information. However, many challenges have so far prevented effective real-world usage in many existing applications. For one, some mechanisms leak some information or have been proposed outside of security models established within the cryptographic community, leaving open how effective they are at protecting privacy in various applications. Additionally, a major challenge causing PETs to remain largely academic is their practicality-in both efficiency and usability. Cryptographic mechanisms introduce a lot of overhead, which is mostly prohibitive, and due to a lack of high-level tools are very hard to integrate for outsiders. In this thesis, we move towards making PETs more effective and practical in protecting privacy in numerous applications. We take a two-sided approach of first analyzing the effective security (cryptanalysis) of candidate mechanisms and then building constructions and tools (cryptographic engineering) for practical use in specified emerging applications in the domain of machine learning crucial to modern use cases. In the process, we incorporate an interdisciplinary perspective for analyzing mechanisms and by collaboratively building privacy-preserving architectures with requirements from the application domains' experts. Cryptanalysis. While mechanisms like Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC) provably leak no additional information, Encrypted Search Algorithms (ESAs) and Randomization-only Two-Party Computation (RoTPC) possess additional properties that require cryptanalysis to determine effective privacy protection. ESAs allow for search on encrypted data, an important functionality in many applications. Most efficient ESAs possess some form of well-defined information leakage, which is cryptanalyzed via a breadth of so-called leakage attacks proposed in the literature. However, it is difficult to assess their practical effectiveness given that previous evaluations were closed-source, used restricted data, and made assumptions about (among others) the query distribution because real-world query data is very hard to find. For these reasons, we re-implement known leakage attacks in an open-source framework and perform a systematic empirical re-evaluation of them using a variety of new data sources that, for the first time, contain real-world query data. We obtain many more complete and novel results where attacks work much better or much worse than what was expected based on previous evaluations. RoTPC mechanisms require cryptanalysis as they do not rely on established techniques and security models, instead obfuscating messages using only randomizations. A prominent protocol is a privacy-preserving scalar product protocol by Lu et al. (IEEE TPDS'13). We show that this protocol is formally insecure and that this translates to practical insecurity by presenting attacks that even allow to test for certain inputs, making the case for more scrutiny of RoTPC protocols used as PETs. This part of the thesis is based on the following two publications: [KKM+22] S. KAMARA, A. KATI, T. MOATAZ, T. SCHNEIDER, A. TREIBER, M. YONLI. “SoK: Cryptanalysis of Encrypted Search with LEAKER - A framework for LEakage AttacK Evaluation on Real-world data”. In: 7th IEEE European Symposium on Security and Privacy (EuroS&P’22). Full version: https://ia.cr/2021/1035. Code: https://encrypto.de/code/LEAKER. IEEE, 2022, pp. 90–108. Appendix A. [ST20] T. SCHNEIDER , A. TREIBER. “A Comment on Privacy-Preserving Scalar Product Protocols as proposed in “SPOC””. In: IEEE Transactions on Parallel and Distributed Systems (TPDS) 31.3 (2020). Full version: https://arxiv.org/abs/1906.04862. Code: https://encrypto.de/code/SPOCattack, pp. 543–546. CORE Rank A*. Appendix B. Cryptographic Engineering. Given the above results about cryptanalysis, we investigate using the leakage-free and provably-secure cryptographic mechanisms of HE and SMPC to protect privacy in machine learning applications. As much of the cryptographic community has focused on PETs for neural network applications, we focus on two other important applications and models: Speaker recognition and sum product networks. We particularly show the efficiency of our solutions in possible real-world scenarios and provide tools usable for non-domain experts. In speaker recognition, a user's voice data is matched with reference data stored at the service provider. Using HE and SMPC, we build the first privacy-preserving speaker recognition system that includes the state-of-the-art technique of cohort score normalization using cohort pruning via SMPC. Then, we build a privacy-preserving speaker recognition system relying solely on SMPC, which we show outperforms previous solutions based on HE by a factor of up to 4000x. We show that both our solutions comply with specific standards for biometric information protection and, thus, are effective and practical PETs for speaker recognition. Sum Product Networks (SPNs) are noteworthy probabilistic graphical models that-like neural networks-also need efficient methods for privacy-preserving inference as a PET. We present CryptoSPN, which uses SMPC for privacy-preserving inference of SPNs that (due to a combination of machine learning and cryptographic techniques and contrary to most works on neural networks) even hides the network structure. Our implementation is integrated into the prominent SPN framework SPFlow and evaluates medium-sized SPNs within seconds. This part of the thesis is based on the following three publications: [NPT+19] A. NAUTSCH, J. PATINO, A. TREIBER, T. STAFYLAKIS, P. MIZERA, M. TODISCO, T. SCHNEIDER, N. EVANS. Privacy-Preserving Speaker Recognition with Cohort Score Normalisation”. In: 20th Conference of the International Speech Communication Association (INTERSPEECH’19). Online: https://arxiv.org/abs/1907.03454. International Speech Communication Association (ISCA), 2019, pp. 2868–2872. CORE Rank A. Appendix C. [TNK+19] A. TREIBER, A. NAUTSCH , J. KOLBERG , T. SCHNEIDER , C. BUSCH. “Privacy-Preserving PLDA Speaker Verification using Outsourced Secure Computation”. In: Speech Communication 114 (2019). Online: https://encrypto.de/papers/TNKSB19.pdf. Code: https://encrypto.de/code/PrivateASV, pp. 60–71. CORE Rank B. Appendix D. [TMW+20] A. TREIBER , A. MOLINA , C. WEINERT , T. SCHNEIDER , K. KERSTING. “CryptoSPN: Privacy-preserving Sum-Product Network Inference”. In: 24th European Conference on Artificial Intelligence (ECAI’20). Full version: https://arxiv.org/abs/2002.00801. Code: https://encrypto.de/code/CryptoSPN. IOS Press, 2020, pp. 1946–1953. CORE Rank A. Appendix E. Overall, this thesis contributes to a broader security analysis of cryptographic mechanisms and new systems and tools to effectively protect privacy in various sought-after applications

    Unattended acoustic sensor systems for noise monitoring in national parks

    Get PDF
    2017 Spring.Includes bibliographical references.Detection and classification of transient acoustic signals is a difficult problem. The problem is often complicated by factors such as the variety of sources that may be encountered, the presence of strong interference and substantial variations in the acoustic environment. Furthermore, for most applications of transient detection and classification, such as speech recognition and environmental monitoring, online detection and classification of these transient events is required. This is even more crucial for applications such as environmental monitoring as it is often done at remote locations where it is unfeasible to set up a large, general-purpose processing system. Instead, some type of custom-designed system is needed which is power efficient yet able to run the necessary signal processing algorithms in near real-time. In this thesis, we describe a custom-designed environmental monitoring system (EMS) which was specifically designed for monitoring air traffic and other sources of interest in national parks. More specifically, this thesis focuses on the capabilities of the EMS and how transient detection, classification and tracking are implemented on it. The Sparse Coefficient State Tracking (SCST) transient detection and classification algorithm was implemented on the EMS board in order to detect and classify transient events. This algorithm was chosen because it was designed for this particular application and was shown to have superior performance compared to other algorithms commonly used for transient detection and classification. The SCST algorithm was implemented on an Artix 7 FPGA with parts of the algorithm running as dedicated custom logic and other parts running sequentially on a soft-core processor. In this thesis, the partitioning and pipelining of this algorithm is explained. Each of the partitions was tested independently to very their functionality with respect to the overall system. Furthermore, the entire SCST algorithm was tested in the field on actual acoustic data and the performance of this implementation was evaluated using receiver operator characteristic (ROC) curves and confusion matrices. In this test the FPGA implementation of SCST was able to achieve acceptable source detection and classification results despite a difficult data set and limited training data. The tracking of acoustic sources is done through successive direction of arrival (DOA) angle estimation using a wideband extension of the Capon beamforming algorithm. This algorithm was also implemented on the EMS in order to provide real-time DOA estimates for the detected sources. This algorithm was partitioned into several stages with some stages implemented in custom logic while others were implemented as software running on the soft-core processor. Just as with SCST, each partition of this beamforming algorithm was verified independently and then a full system test was conducted to evaluate whether it would be able to track an airborne source. For the full system test, a model airplane was flown at various trajectories relative to the EMS and the trajectories estimated by the system were compared to the ground truth. Although in this test the accuracy of the DOA estimates could not be evaluated, it was show that the algorithm was able to approximately form the general trajectory of a moving source which is sufficient for our application as only a general heading of the acoustic sources is desired

    Information Theoretic Methods For Biometrics, Clustering, And Stemmatology

    Get PDF
    This thesis consists of four parts, three of which study issues related to theories and applications of biometric systems, and one which focuses on clustering. We establish an information theoretic framework and the fundamental trade-off between utility of biometric systems and security of biometric systems. The utility includes person identification and secret binding, while template protection, privacy, and secrecy leakage are security issues addressed. A general model of biometric systems is proposed, in which secret binding and the use of passwords are incorporated. The system model captures major biometric system designs including biometric cryptosystems, cancelable biometrics, secret binding and secret generating systems, and salt biometric systems. In addition to attacks at the database, information leakage from communication links between sensor modules and databases is considered. A general information theoretic rate outer bound is derived for characterizing and comparing the fundamental capacity, and security risks and benefits of different system designs. We establish connections between linear codes to biometric systems, so that one can directly use a vast literature of coding theories of various noise and source random processes to achieve good performance in biometric systems. We develop two biometrics based on laser Doppler vibrometry: LDV) signals and electrocardiogram: ECG) signals. For both cases, changes in statistics of biometric traits of the same individual is the major challenge which obstructs many methods from producing satisfactory results. We propose a ii robust feature selection method that specifically accounts for changes in statistics. The method yields the best results both in LDV and ECG biometrics in terms of equal error rates in authentication scenarios. Finally, we address a different kind of learning problem from data called clustering. Instead of having a set of training data with true labels known as in identification problems, we study the problem of grouping data points without labels given, and its application to computational stemmatology. Since the problem itself has no true answer, the problem is in general ill-posed unless some regularization or norm is set to define the quality of a partition. We propose the use of minimum description length: MDL) principle for graphical based clustering. In the MDL framework, each data partitioning is viewed as a description of the data points, and the description that minimizes the total amount of bits to describe the data points and the model itself is considered the best model. We show that in synthesized data the MDL clustering works well and fits natural intuition of how data should be clustered. Furthermore, we developed a computational stemmatology method based on MDL, which achieves the best performance level in a large dataset

    Calculating likelihood ratios for forensic speaker comparisons using phonetic and linguistic parameters

    Get PDF
    The research presented in this thesis examines the calculation of numerical likelihood ratios using phonetic and linguistic parameters derived from a corpus of recordings of speakers of Southern Standard British English. The research serves as an investigation into the development of the numerical likelihood ratio as a medium for framing forensic speaker comparison conclusions. The thesis begins by investigating which parameters are claimed to be the most useful speaker discriminants according to expert opinion, and in turn examines four of these ‘selected/valued’ parameters individually in relation to intra- and inter-speaker variation, their capacities as speaker discriminants, and the potential strength of evidence they yield. The four parameters analyzed are articulation rate, fundamental frequency, long-term formant distributions, and the incidence of clicks (velaric ingressive plosives). The final portion of the thesis considers the combination of the four parameters under a numerical likelihood ratio framework in order to provide an overall likelihood ratio. The contributions of this research are threefold. Firstly, the thesis presents for the first time a comprehensive survey of current forensic speaker comparison practices around the world. Secondly, it expands the phonetic literature by providing acoustic and auditory analysis, as well as population statistics, for four phonetic and linguistic parameters that survey participants have identified as effective speaker discriminants. And thirdly, it contributes to the forensic speech science and likelihood ratios for forensics literature by considering what steps can be taken to conceptually align the area of forensic speaker comparison with more developed areas of forensic science (e.g. DNA) by creating a human-based (auditory and acoustic-phonetic) forensic speaker comparison system
    corecore