252 research outputs found

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Full text link
    We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields 67.6%67.6\% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure

    Deep fingerprint classification network

    Get PDF
    Fingerprint is one of the most well-known biometrics that has been used for personal recognition. However, faked fingerprints have become the major enemy where they threat the security of this biometric. This paper proposes an efficient deep fingerprint classification network (DFCN) model to achieve accurate performances of classifying between real and fake fingerprints. This model has extensively evaluated or examined parameters. Total of 512 images from the ATVS-FFp_DB dataset are employed. The proposed DFCN achieved high classification performance of 99.22%, where fingerprint images are successfully classified into their two categories. Moreover, comparisons with state-of-art approaches are provided

    Capturing scattered discriminative information using a deep architecture in acoustic scene classification

    Full text link
    Frequently misclassified pairs of classes that share many common acoustic properties exist in acoustic scene classification (ASC). To distinguish such pairs of classes, trivial details scattered throughout the data could be vital clues. However, these details are less noticeable and are easily removed using conventional non-linear activations (e.g. ReLU). Furthermore, making design choices to emphasize trivial details can easily lead to overfitting if the system is not sufficiently generalized. In this study, based on the analysis of the ASC task's characteristics, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem. We adopt a max feature map method to replace conventional non-linear activations in a deep neural network, and therefore, we apply an element-wise comparison between different filters of a convolution layer's output. Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power. Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods. Our results show that the proposed system consistently outperforms the baseline, where the single best performing system has an accuracy of 70.4% compared to 65.1% of the baseline.Comment: Submitted to DCASE2020 worksho

    Detecção de ataques de apresentação por faces em dispositivos móveis

    Get PDF
    Orientadores: Anderson de Rezende Rocha, Fernanda Alcântara AndalóDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Com o crescimento e popularização de tecnologias de autenticação biométrica, tais como aquelas baseadas em reconhecimento facial, aumenta-se também a motivação para se explorar ataques em nível de sensor de captura ameaçando a eficácia dessas aplicações em cenários reais. Um desses ataques se dá quando um impostor, desejando destravar um celular alheio, busca enganar o sistema de reconhecimento facial desse dispositivo apresentando a ele uma foto do usuário alvo. Neste trabalho, estuda-se o problema de detecção automática de ataques de apresentação ao reconhecimento facial em dispositivos móveis, considerando o caso de uso de destravamento rápido e as limitações desses dispositivos. Não se assume o uso de sensores adicionais, ou intervenção consciente do usuário, dependendo apenas da imagem capturada pela câmera frontal em todos os processos de decisão. Contribuições foram feitas em relação a diferentes aspectos do problema. Primeiro, foi coletada uma base de dados de ataques de apresentação chamada RECOD-MPAD, que foi especificamente projetada para o cenário alvo, possuindo variações realistas de iluminação, incluindo sessões ao ar livre e de baixa luminosidade, ao contrário das bases públicas disponíveis atualmente. Em seguida, para enriquecer o entendimento do que se pode esperar de métodos baseados puramente em software, adota-se uma abordagem em que as características determinantes para o problema são aprendidas diretamente dos dados a partir de redes convolucionais, diferenciando-se de abordagens tradicionais baseadas em conhecimentos específicos de aspectos do problema. São propostas três diferentes formas de treinamento da rede convolucional profunda desenvolvida para detectar ataques de apresentação: treinamento com faces inteiras e alinhadas, treinamento com patches (regiões de interesse) de resolução variável, e treinamento com uma função objetivo projetada especificamente para o problema. Usando uma arquitetura leve como núcleo da nossa rede, certifica-se que a solução desenvolvida pode ser executada diretamente em celulares disponíveis no mercado no ano de 2017. Adicionalmente, é feita uma análise que considera protocolos inter-fatores e disjuntos de usuário, destacando-se alguns dos problemas com bases de dados e abordagens atuais. Experimentos no benchmark OULU-NPU, proposto recentemente e usado em uma competição internacional, sugerem que os métodos propostos se comparam favoravelmente ao estado da arte, e estariam entre os melhores na competição, mesmo com a condição de pouco uso de memória e recursos computacionais limitados. Finalmente, para melhor adaptar a solução a cada usuário, propõe-se uma forma efetiva de usar uma galeria de dados do usuário para adaptar os modelos ao usuário e ao dispositivo usado, aumentando sua eficácia no cenário operacionalAbstract: With the widespread use of biometric authentication systems, such as those based on face recognition, comes the exploitation of simple attacks at the sensor level that can undermine the effectiveness of these technologies in real-world setups. One example of such attack takes place when an impostor, aiming at unlocking someone else's smartphone, deceives the device¿s built-in face recognition system by presenting a printed image of the genuine user's face. In this work, we study the problem of automatically detecting presentation attacks against face authentication methods in mobile devices, considering the use-case of fast device unlocking and hardware constraints of such devices. We do not assume the existence of any extra sensors or user intervention, relying only on the image captured by the device¿s frontal camera. Our contributions lie on multiple aspects of the problem. Firstly, we collect RECOD-MPAD, a new presentation-attack dataset that is tailored to the mobile-device setup, and is built to have real-world variations in lighting, including outdoors and low-light sessions, in contrast to existing public datasets. Secondly, to enrich the understanding of how far we can go with purely software-based methods when tackling this problem, we adopt a solely data-driven approach ¿ differently from handcrafted methods in prior art that focus on specific aspects of the problem ¿ and propose three different ways of training a deep convolutional neural network to detect presentation attacks: training with aligned faces, training with multi-resolution patches, and training with a multi-objective loss function crafted specifically to the problem. By using a lightweight architecture as the core of our network, we ensure that our solution can be efficiently embedded in modern smartphones in the market at the year of 2017. Additionally, we provide a careful analysis that considers several user-disjoint and cross-factor protocols, highlighting some of the problems with current datasets and approaches. Experiments with the OULU-NPU benchmark, which was used recently in an international competition, suggest that our methods are among the top performing ones. Finally, to further enhance the model's efficacy and discriminability in the target setup of user authentication for mobile devices, we propose a method that leverages the available gallery of user data in the device and adapts the method decision-making process to the user's and device¿s own characteristicsMestradoCiência da ComputaçãoMestre em Ciência da Computaçã
    corecore