3,623 research outputs found

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    Métodos discriminativos para la optimización de modelos en la Verificación del Hablante

    Get PDF
    La creciente necesidad de sistemas de autenticación seguros ha motivado el interés de algoritmos efectivos de Verificación de Hablante (VH). Dicha necesidad de algoritmos de alto rendimiento, capaces de obtener tasas de error bajas, ha abierto varias ramas de investigación. En este trabajo proponemos investigar, desde un punto de vista discriminativo, un conjunto de metodologías para mejorar el desempeño del estado del arte de los sistemas de VH. En un primer enfoque investigamos la optimización de los hiper-parámetros para explícitamente considerar el compromiso entre los errores de falsa aceptación y falso rechazo. El objetivo de la optimización se puede lograr maximizando el área bajo la curva conocida como ROC (Receiver Operating Characteristic) por sus siglas en inglés. Creemos que esta optimización de los parámetros no debe de estar limitada solo a un punto de operación y una estrategia más robusta es optimizar los parámetros para incrementar el área bajo la curva, AUC (Area Under the Curve por sus siglas en inglés) de modo que todos los puntos sean maximizados. Estudiaremos cómo optimizar los parámetros utilizando la representación matemática del área bajo la curva ROC basada en la estadística de Wilcoxon Mann Whitney (WMW) y el cálculo adecuado empleando el algoritmo de descendente probabilístico generalizado. Además, analizamos el efecto y mejoras en métricas como la curva detection error tradeoff (DET), el error conocido como Equal Error Rate (EER) y el valor mínimo de la función de detección de costo, minimum value of the detection cost function (minDCF) todos ellos por sue siglas en inglés. En un segundo enfoque, investigamos la señal de voz como una combinación de atributos que contienen información del hablante, del canal y el ruido. Los sistemas de verificación convencionales entrenan modelos únicos genéricos para todos los casos, y manejan las variaciones de estos atributos ya sea usando análisis de factores o no considerando esas variaciones de manera explícita. Proponemos una nueva metodología para particionar el espacio de los datos de acuerdo a estas carcterísticas y entrenar modelos por separado para cada partición. Las particiones se pueden obtener de acuerdo a cada atributo. En esta investigación mostraremos como entrenar efectivamente los modelos de manera discriminativa para maximizar la separación entre ellos. Además, el diseño de algoritimos robustos a las condiciones de ruido juegan un papel clave que permite a los sistemas de VH operar en condiciones reales. Proponemos extender nuestras metodologías para mitigar los efectos del ruido en esas condiciones. Para nuestro primer enfoque, en una situación donde el ruido se encuentre presente, el punto de operación puede no ser solo un punto, o puede existir un corrimiento de forma impredecible. Mostraremos como nuestra metodología de maximización del área bajo la curva ROC es más robusta que la usada por clasificadores convencionales incluso cuando el ruido no está explícitamente considerado. Además, podemos encontrar ruido a diferentes relación señal a ruido (SNR) que puede degradar el desempeño del sistema. Así, es factible considerar una descomposición eficiente de las señales de voz que tome en cuenta los diferentes atributos como son SNR, el ruido y el tipo de canal. Consideramos que en lugar de abordar el problema con un modelo unificado, una descomposición en particiones del espacio de características basado en atributos especiales puede proporcionar mejores resultados. Esos atributos pueden representar diferentes canales y condiciones de ruido. Hemos analizado el potencial de estas metodologías que permiten mejorar el desempeño del estado del arte de los sistemas reduciendo el error, y por otra parte controlar los puntos de operación y mitigar los efectos del ruido

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Vulnerability assessment in the use of biometrics in unsupervised environments

    Get PDF
    Mención Internacional en el título de doctorIn the last few decades, we have witnessed a large-scale deployment of biometric systems in different life applications replacing the traditional recognition methods such as passwords and tokens. We approached a time where we use biometric systems in our daily life. On a personal scale, the authentication to our electronic devices (smartphones, tablets, laptops, etc.) utilizes biometric characteristics to provide access permission. Moreover, we access our bank accounts, perform various types of payments and transactions using the biometric sensors integrated into our devices. On the other hand, different organizations, companies, and institutions use biometric-based solutions for access control. On the national scale, police authorities and border control measures use biometric recognition devices for individual identification and verification purposes. Therefore, biometric systems are relied upon to provide a secured recognition where only the genuine user can be recognized as being himself. Moreover, the biometric system should ensure that an individual cannot be identified as someone else. In the literature, there are a surprising number of experiments that show the possibility of stealing someone’s biometric characteristics and use it to create an artificial biometric trait that can be used by an attacker to claim the identity of the genuine user. There were also real cases of people who successfully fooled the biometric recognition system in airports and smartphones [1]–[3]. That urges the necessity to investigate the potential threats and propose countermeasures that ensure high levels of security and user convenience. Consequently, performing security evaluations is vital to identify: (1) the security flaws in biometric systems, (2) the possible threats that may target the defined flaws, and (3) measurements that describe the technical competence of the biometric system security. Identifying the system vulnerabilities leads to proposing adequate security solutions that assist in achieving higher integrity. This thesis aims to investigate the vulnerability of fingerprint modality to presentation attacks in unsupervised environments, then implement mechanisms to detect those attacks and avoid the misuse of the system. To achieve these objectives, the thesis is carried out in the following three phases. In the first phase, the generic biometric system scheme is studied by analyzing the vulnerable points with special attention to the vulnerability to presentation attacks. The study reviews the literature in presentation attack and the corresponding solutions, i.e. presentation attack detection mechanisms, for six biometric modalities: fingerprint, face, iris, vascular, handwritten signature, and voice. Moreover, it provides a new taxonomy for presentation attack detection mechanisms. The proposed taxonomy helps to comprehend the issue of presentation attacks and how the literature tried to address it. The taxonomy represents a starting point to initialize new investigations that propose novel presentation attack detection mechanisms. In the second phase, an evaluation methodology is developed from two sources: (1) the ISO/IEC 30107 standard, and (2) the Common Evaluation Methodology by the Common Criteria. The developed methodology characterizes two main aspects of the presentation attack detection mechanism: (1) the resistance of the mechanism to presentation attacks, and (2) the corresponding threat of the studied attack. The first part is conducted by showing the mechanism's technical capabilities and how it influences the security and ease-of-use of the biometric system. The second part is done by performing a vulnerability assessment considering all the factors that affect the attack potential. Finally, a data collection is carried out, including 7128 fingerprint videos of bona fide and attack presentation. The data is collected using two sensing technologies, two presentation scenarios, and considering seven attack species. The database is used to develop dynamic presentation attack detection mechanisms that exploit the fingerprint spatio-temporal features. In the final phase, a set of novel presentation attack detection mechanisms is developed exploiting the dynamic features caused by the natural fingerprint phenomena such as perspiration and elasticity. The evaluation results show an efficient capability to detect attacks where, in some configurations, the mechanisms are capable of eliminating some attack species and mitigating the rest of the species while keeping the user convenience at a high level.En las últimas décadas, hemos asistido a un despliegue a gran escala de los sistemas biométricos en diferentes aplicaciones de la vida cotidiana, sustituyendo a los métodos de reconocimiento tradicionales, como las contraseñas y los tokens. Actualmente los sistemas biométricos ya forman parte de nuestra vida cotidiana: es habitual emplear estos sistemas para que nos proporcionen acceso a nuestros dispositivos electrónicos (teléfonos inteligentes, tabletas, ordenadores portátiles, etc.) usando nuestras características biométricas. Además, accedemos a nuestras cuentas bancarias, realizamos diversos tipos de pagos y transacciones utilizando los sensores biométricos integrados en nuestros dispositivos. Por otra parte, diferentes organizaciones, empresas e instituciones utilizan soluciones basadas en la biometría para el control de acceso. A escala nacional, las autoridades policiales y de control fronterizo utilizan dispositivos de reconocimiento biométrico con fines de identificación y verificación individual. Por lo tanto, en todas estas aplicaciones se confía en que los sistemas biométricos proporcionen un reconocimiento seguro en el que solo el usuario genuino pueda ser reconocido como tal. Además, el sistema biométrico debe garantizar que un individuo no pueda ser identificado como otra persona. En el estado del arte, hay un número sorprendente de experimentos que muestran la posibilidad de robar las características biométricas de alguien, y utilizarlas para crear un rasgo biométrico artificial que puede ser utilizado por un atacante con el fin de reclamar la identidad del usuario genuino. También se han dado casos reales de personas que lograron engañar al sistema de reconocimiento biométrico en aeropuertos y teléfonos inteligentes [1]–[3]. Esto hace que sea necesario investigar estas posibles amenazas y proponer contramedidas que garanticen altos niveles de seguridad y comodidad para el usuario. En consecuencia, es vital la realización de evaluaciones de seguridad para identificar (1) los fallos de seguridad de los sistemas biométricos, (2) las posibles amenazas que pueden explotar estos fallos, y (3) las medidas que aumentan la seguridad del sistema biométrico reduciendo estas amenazas. La identificación de las vulnerabilidades del sistema lleva a proponer soluciones de seguridad adecuadas que ayuden a conseguir una mayor integridad. Esta tesis tiene como objetivo investigar la vulnerabilidad en los sistemas de modalidad de huella dactilar a los ataques de presentación en entornos no supervisados, para luego implementar mecanismos que permitan detectar dichos ataques y evitar el mal uso del sistema. Para lograr estos objetivos, la tesis se desarrolla en las siguientes tres fases. En la primera fase, se estudia el esquema del sistema biométrico genérico analizando sus puntos vulnerables con especial atención a los ataques de presentación. El estudio revisa la literatura sobre ataques de presentación y las soluciones correspondientes, es decir, los mecanismos de detección de ataques de presentación, para seis modalidades biométricas: huella dactilar, rostro, iris, vascular, firma manuscrita y voz. Además, se proporciona una nueva taxonomía para los mecanismos de detección de ataques de presentación. La taxonomía propuesta ayuda a comprender el problema de los ataques de presentación y la forma en que la literatura ha tratado de abordarlo. Esta taxonomía presenta un punto de partida para iniciar nuevas investigaciones que propongan novedosos mecanismos de detección de ataques de presentación. En la segunda fase, se desarrolla una metodología de evaluación a partir de dos fuentes: (1) la norma ISO/IEC 30107, y (2) Common Evaluation Methodology por el Common Criteria. La metodología desarrollada considera dos aspectos importantes del mecanismo de detección de ataques de presentación (1) la resistencia del mecanismo a los ataques de presentación, y (2) la correspondiente amenaza del ataque estudiado. Para el primer punto, se han de señalar las capacidades técnicas del mecanismo y cómo influyen en la seguridad y la facilidad de uso del sistema biométrico. Para el segundo aspecto se debe llevar a cabo una evaluación de la vulnerabilidad, teniendo en cuenta todos los factores que afectan al potencial de ataque. Por último, siguiendo esta metodología, se lleva a cabo una recogida de datos que incluye 7128 vídeos de huellas dactilares genuinas y de presentación de ataques. Los datos se recogen utilizando dos tecnologías de sensor, dos escenarios de presentación y considerando siete tipos de instrumentos de ataque. La base de datos se utiliza para desarrollar y evaluar mecanismos dinámicos de detección de ataques de presentación que explotan las características espacio-temporales de las huellas dactilares. En la fase final, se desarrolla un conjunto de mecanismos novedosos de detección de ataques de presentación que explotan las características dinámicas causadas por los fenómenos naturales de las huellas dactilares, como la transpiración y la elasticidad. Los resultados de la evaluación muestran una capacidad eficiente de detección de ataques en la que, en algunas configuraciones, los mecanismos son capaces de eliminar completamente algunos tipos de instrumentos de ataque y mitigar el resto de los tipos manteniendo la comodidad del usuario en un nivel alto.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Cristina Conde Vila.- Secretario: Mariano López García.- Vocal: Farzin Derav

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

    Automatic speaker recognition: modelling, feature extraction and effects of clinical environment

    Get PDF
    Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced

    Technology for the Future: In-Space Technology Experiments Program, part 2

    Get PDF
    The purpose of the Office of Aeronautics and Space Technology (OAST) In-Space Technology Experiments Program In-STEP 1988 Workshop was to identify and prioritize technologies that are critical for future national space programs and require validation in the space environment, and review current NASA (In-Reach) and industry/ university (Out-Reach) experiments. A prioritized list of the critical technology needs was developed for the following eight disciplines: structures; environmental effects; power systems and thermal management; fluid management and propulsion systems; automation and robotics; sensors and information systems; in-space systems; and humans in space. This is part two of two parts and contains the critical technology presentations for the eight theme elements and a summary listing of critical space technology needs for each theme

    NASA space station automation: AI-based technology review

    Get PDF
    Research and Development projects in automation for the Space Station are discussed. Artificial Intelligence (AI) based automation technologies are planned to enhance crew safety through reduced need for EVA, increase crew productivity through the reduction of routine operations, increase space station autonomy, and augment space station capability through the use of teleoperation and robotics. AI technology will also be developed for the servicing of satellites at the Space Station, system monitoring and diagnosis, space manufacturing, and the assembly of large space structures

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    Trustworthy Biometric Verification under Spoofing Attacks:Application to the Face Mode

    Get PDF
    The need for automation of the identity recognition process for a vast number of applications resulted in great advancement of biometric systems in the recent years. Yet, many studies indicate that these systems suffer from vulnerabilities to spoofing (presentation) attacks: a weakness that may compromise their usage in many cases. Face verification systems account for one of the most attractive spoofing targets, due to the easy access to face images of users, as well as the simplicity of the spoofing attack manufacturing process. Many counter-measures to spoofing have been proposed in the literature. They are based on different cues that are used to distinguish between real accesses and spoofing attacks. The task of detecting spoofing attacks is most often considered as a binary classification problem, with real accesses being the positive class and spoofing attacks being the negative class. The main objective of this thesis is to put the problem of anti-spoofing in a wider context, with an accent on its cooperation with a biometric verification system. In such a context, it is important to adopt an integrated perspective on biometric verification and anti-spoofing. In this thesis we identify and address three points where integration of the two systems is of interest. The first integration point is situated at input-level. At this point, we are concerned with providing a unified information that both verification and anti-spoofing systems use. The unified information includes the samples used to enroll clients in the system, as well as the identity claims of the client at query time. We design two anti-spoofing schemes, one with a generative and one with a discriminative approach, which we refer to as client-specific, as opposed to the traditional client-independent ones. The proposed methods are applied on several case studies for the face mode. Overall, the experimental results prove the integration to be beneficial for creating trustworthy face verification systems. At input-level, the results show the advantage of the client-specific approaches over the client-independent ones. At output-level, they present a comparison of the fusion methods. The case studies are furthermore used to demonstrate the EPS framework and its potential in evaluation of biometric verification systems under spoofing attacks. The source code for the full set of methods is available as free software, as a satellite package to the free signal processing and machine learning toolbox Bob. It can be used to reproduce the results of the face mode case studies presented in this thesis, as well as to perform additional analysis and improve the proposed methods. Furthermore, it can be used to design case studies applying the proposed methods to other biometric modes. At the second integration point, situated at output-level, we address the issue of combining the output of biometric verification and anti-spoofing systems in order to achieve an optimal combined decision about an input sample. We adopt a multiple expert fusion approach and we investigate several fusion methods, comparing the verification performance and robustness to spoofing of the fused systems. The third integration point is associated with the evaluation process. The integrated perspective implies three types of inputs for the biometric system: real accesses, zero-effort impostors and spoofing attacks. We propose an evaluation methodology for biometric verification systems under spoofing attacks, called Expected Performance and Spoofability (EPS) framework, which accounts for all the three types of input and the error rates associated with them. Within this framework, we propose the EPS Curve (EPSC), which enables unbiased comparison of systems
    corecore