12,440 research outputs found

    Real-Time Deep Learning-Based Face Recognition System

    Get PDF
    This research proposes Real-time Deep Learning-based Face recognition algorithms using MATLAB and Python. Generally, Face recognition is defined as the process through which people are identified using facial images. This technology is applied broadly in biometrics, security information, accessing controlled areas, etc. The facial recognition system can be built by following two steps. In the first step, the facial features are picked up or extracted, then the second step involves pattern classification. Deep learning, specifically the convolutional neural network (CNN), has recently made more progress in face recognition technology. Convolution Neural Network is one among the Deep Learning approaches and has shown excellent performance in many fields, such as image recognition of a large amount of training data (such as ImageNet). However, due to hardware limitations and insufficient training datasets, high performance is not achieved. Therefore, in this work, the Transfer Learning method is used to improve the performance of the face-recognition system even for a smaller number of images. For this, two pre-trained models, namely, GoogLeNet CNN (in MATLAB) and FaceNet (in Python) are used. Transfer learning is used to perform fine-tuning on the last layer of CNN model for new classification tasks. FaceNet presents a unified system for face verification (is this the same person?), recognition (who is this person?) and clustering (finds common people among these faces) using the method based on learning a Euclidean embedding per image using a deep convolutional network

    Learning Convolutional Neural Network For Face Verification

    Get PDF
    Convolutional neural networks (ConvNet) have improved the state of the art in many applications. Face recognition tasks, for example, have seen a significantly improved performance due to ConvNets. However, less attention has been given to video-based face recognition. Here, we make three contributions along these lines. First, we proposed a ConvNet-based system for long-term face tracking from videos. Through taking advantage of pre-trained deep learning models on big data, we developed a novel system for accurate video face tracking in the unconstrained environments depicting various people and objects moving in and out of the frame. In the proposed system, we presented a Detection-Verification-Tracking method (DVT) which accomplishes the long-term face tracking task through the collaboration of face detection, face verification, and (short-term) face tracking. An online trained detector based on cascaded convolutional neural networks localizes all faces appeared in the frames, and an online trained face verifier based on deep convolutional neural networks and similarity metric learning decides if any face or which face corresponds to the query person. An online trained tracker follows the face from frame to frame. When validated on a sitcom episode and a TV show, the DVT method outperforms tracking-learning-detection (TLD) and face-TLD in terms of recall and precision. The proposed system is tested on many other types of videos and shows very promising results. Secondly, as the availability of large-scale training dataset has a significant effect on the performance of ConvNet-based recognition methods, we presented a successful automatic video collection approach to generate a large-scale video training dataset. We designed a procedure for generating a face verification dataset from videos based on the long-term face tracking algorithm, DVT. In this procedure, the streams can be collected from videos, and labeled automatically without human annotation intervention. Using this procedure, we assembled a widely scalable dataset, FaceSequence. FaceSequence includes 1.5M streams capturing ~500K individuals. A key distinction between this dataset and the existing video datasets is that FaceSequence is generated from publicly available videos and labeled automatically, hence widely scalable at no annotation cost. Lastly, we introduced a stream-based ConvNet architecture for video face verification task. The proposed network is designed to optimize the differentiable error function, referred to as stream loss, using unlabeled temporal face sequences. Using the unlabeled video dataset, FaceSequence, we trained our network to minimize the stream loss. The network achieves verification accuracy comparable to the state of the art on the LFW and YTF datasets with much smaller model complexity. In comparison to VGG, our method demonstrates a significant improvement in TAR/FAR, considering the fact that the VGG dataset is highly puried and includes a small label noise. We also fine-tuned the network using the IJB-A dataset. The validation results show competitive verifiation accuracy compared with the best previous video face verification results

    Deep learning architectures for Computer Vision

    Get PDF
    Deep learning has become part of many state-of-the-art systems in multiple disciplines (specially in computer vision and speech processing). In this thesis Convolutional Neural Networks are used to solve the problem of recognizing people in images, both for verification and identification. Two different architectures, AlexNet and VGG19, both winners of the ILSVRC, have been fine-tuned and tested with four datasets: Labeled Faces in the Wild, FaceScrub, YouTubeFaces and Google UPC, a dataset generated at the UPC. Finally, with the features extracted from these fine-tuned networks, some verifications algorithms have been tested including Support Vector Machines, Joint Bayesian and Advanced Joint Bayesian formulation. The results of this work show that an Area Under the Receiver Operating Characteristic curve of 99.6% can be obtained, close to the state-of-the-art performance.El aprendizaje profundo se ha convertido en parte de muchos sistemas en el estado del arte de múltiples ámbitos (especialmente en visión por computador y procesamiento de voz). En esta tesis se utilizan las Redes Neuronales Convolucionales para resolver el problema de reconocer a personas en imágenes, tanto para verificación como para identificación. Dos arquitecturas diferentes, AlexNet y VGG19, ambas ganadores del ILSVRC, han sido afinadas y probadas con cuatro conjuntos de datos: Labeled Faces in the Wild, FaceScrub, YouTubeFaces y Google UPC, un conjunto generado en la UPC. Finalmente con las características extraídas de las redes afinadas, se han probado diferentes algoritmos de verificación, incluyendo Maquinas de Soporte Vectorial, Joint Bayesian y Advanced Joint Bayesian. Los resultados de este trabajo muestran que el Área Bajo la Curva de la Característica Operativa del Receptor puede llegar a ser del 99.6%, cercana al valor del estado del arte.L’aprenentatge profund s’ha convertit en una part importat de molts sistemes a l’estat de l’art de múltiples àmbits (especialment de la visió per computador i el processament de veu). A aquesta tesi s’utilitzen les Xarxes Neuronals Convolucionals per a resoldre el problema de reconèixer persones a imatges, tant per verificació com per identificatió. Dos arquitectures diferents, AlexNet i VGG19, les dues guanyadores del ILSVRC, han sigut afinades i provades amb quatre bases de dades: Labeled Faces in the Wild, FaceScrub, YouTubeFaces i Google UPC, un conjunt generat a la UPC. Finalment, amb les característiques extretes de les xarxes afinades, s’han provat diferents algoritmes de verificació, incloent Màquines de Suport Vectorial, Joint Bayesian i Advanced Joint Bayesian. Els resultats d’aquest treball mostres que un Àrea Baix la Curva de la Característica Operativa del Receptor por arribar a ser del 99.6%, propera al valor de l’estat de l’art

    A Discriminatively Learned CNN Embedding for Person Re-identification

    Full text link
    We revisit two popular convolutional neural networks (CNN) in person re-identification (re-ID), i.e, verification and classification models. The two models have their respective advantages and limitations due to different loss functions. In this paper, we shed light on how to combine the two models to learn more discriminative pedestrian descriptors. Specifically, we propose a new siamese network that simultaneously computes identification loss and verification loss. Given a pair of training images, the network predicts the identities of the two images and whether they belong to the same identity. Our network learns a discriminative embedding and a similarity measurement at the same time, thus making full usage of the annotations. Albeit simple, the learned embedding improves the state-of-the-art performance on two public person re-ID benchmarks. Further, we show our architecture can also be applied in image retrieval
    corecore