40 research outputs found

    Face Hallucination via Deep Neural Networks.

    Get PDF
    We firstly address aligned low-resolution (LR) face images (i.e. 16X16 pixels) by designing a discriminative generative network, named URDGN. URDGN is composed of two networks: a generative model and a discriminative model. We introduce a pixel-wise L2 regularization term to the generative model and exploit the feedback of the discriminative network to make the upsampled face images more similar to real ones. We present an end-to-end transformative discriminative neural network (TDN) devised for super-resolving unaligned tiny face images. TDN embeds spatial transformation layers to enforce local receptive fields to line-up with similar spatial supports. To upsample noisy unaligned LR face images, we propose decoder-encoder-decoder networks. A transformative discriminative decoder network is employed to upsample and denoise LR inputs simultaneously. Then we project the intermediate HR faces to aligned and noise-free LR faces by a transformative encoder network. Finally, high-quality hallucinated HR images are generated by our second decoder. Furthermore, we present an end-to-end multiscale transformative discriminative neural network (MTDN) to super-resolve unaligned LR face images of different resolutions in a unified framework. We propose a method that explicitly incorporates structural information of faces into the face super-resolution process by using a multi-task convolutional neural network (CNN). Our method not only uses low-level information (i.e. intensity similarity), but also middle-level information (i.e. face structure) to further explore spatial constraints of facial components from LR inputs images. We demonstrate that supplementing residual images or feature maps with additional facial attribute information can significantly reduce the ambiguity in face super-resolution. To explore this idea, we develop an attribute-embedded upsampling network. In this manner, our method is able to super-resolve LR faces by a large upscaling factor while reducing the uncertainty of one-to-many mappings remarkably. We further push the boundaries of hallucinating a tiny, non-frontal face image to understand how much of this is possible by leveraging the availability of large datasets and deep networks. To this end, we introduce a novel Transformative Adversarial Neural Network (TANN) to jointly frontalize very LR out-of-plane rotated face images (including profile views) and aggressively super-resolve them by 8X, regardless of their original poses and without using any 3D information. Besides recovering an HR face images from an LR version, this thesis also addresses the task of restoring realistic faces from stylized portrait images, which can also be regarded as face hallucination

    Face Recognition and Facial Attribute Analysis from Unconstrained Visual Data

    Get PDF
    Analyzing human faces from visual data has been one of the most active research areas in the computer vision community. However, it is a very challenging problem in unconstrained environments due to variations in pose, illumination, expression, occlusion and blur between training and testing images. The task becomes even more difficult when only a limited number of images per subject is available for modeling these variations. In this dissertation, different techniques for performing classification of human faces as well as other facial attributes such as expression, age, gender, and head pose in uncontrolled settings are investigated. In the first part of the dissertation, a method for reconstructing the virtual frontal view from a given non-frontal face image using Markov Random Fields (MRFs) and an efficient variant of the Belief Propagation (BP) algorithm is introduced. In the proposed approach, the input face image is divided into a grid of overlapping patches and a globally optimal set of local warps is estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade (LK) algorithm that can handle illumination variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial landmarks as well as no head pose estimation is needed. In the second part, the task of face recognition in unconstrained settings is formulated as a domain adaptation problem. The domain shift is accounted for by deriving a latent subspace or domain, which jointly characterizes the multifactor variations using appropriate image formation models for each factor. The latent domain is defined as a product of Grassmann manifolds based on the underlying geometry of the tensor space, and recognition is performed across domain shift using statistics consistent with the tensor geometry. More specifically, given a face image from the source or target domain, multiple images of that subject are first synthesized under different illuminations, blur conditions, and 2D perturbations to form a tensor representation of the face. The orthogonal matrices obtained from the decomposition of this tensor, where each matrix corresponds to a factor variation, are used to characterize the subject as a point on a product of Grassmann manifolds. For cases with only one image per subject in the source domain, the identity of target domain faces is estimated using the geodesic distance on product manifolds. When multiple images per subject are available, an extension of kernel discriminant analysis is developed using a novel kernel based on the projection metric on product spaces. Furthermore, a probabilistic approach to the problem of classifying image sets on product manifolds is introduced. Understanding attributes such as expression, age class, and gender from face images has many applications in multimedia processing including content personalization, human-computer interaction, and facial identification. To achieve good performance in these tasks, it is important to be able to extract pertinent visual structures from the input data. In the third part of the dissertation, a fully automatic approach for performing classification of facial attributes based on hierarchical feature learning using sparse coding is presented. The proposed approach is generative in the sense that it does not use label information in the process of feature learning. As a result, the same feature representation can be applied for different tasks such as expression, age, and gender classification. Final classification is performed by linear SVM trained with the corresponding labels for each task. The last part of the dissertation presents an automatic algorithm for determining the head pose from a given face image. The face image is divided into a regular grid and represented by dense SIFT descriptors extracted from the grid points. Random Projection (RP) is then applied to reduce the dimension of the concatenated SIFT descriptor vector. Classification and regression using Support Vector Machine (SVM) are combined in order to obtain an accurate estimate of the head pose. The advantage of the proposed approach is that it does not require facial landmarks such as the eye and mouth corners, the nose tip to be extracted from the input face image as in many other methods

    Recursive Copy and Paste GAN: Face Hallucination from Shaded Thumbnails.

    Full text link
    Existing face hallucination methods based on convolutional neural networks (CNNs) have achieved impressive performance on low-resolution (LR) faces in a normal illumination condition. However, their performance degrades dramatically when LR faces are captured in non-uniform illumination conditions. This paper proposes a Recursive Copy and Paste Generative Adversarial Network (Re-CPGAN) to recover authentic high-resolution (HR) face images while compensating for non-uniform illumination. To this end, we develop two key components in our Re-CPGAN: internal and recursive external Copy and Paste networks (CPnets). Our internal CPnet exploits facial self-similarity information residing in the input image to enhance facial details; while our recursive external CPnet leverages an external guided face for illumination compensation. Specifically, our recursive external CPnet stacks multiple external Copy and Paste (EX-CP) units in a compact model to learn normal illumination and enhance facial details recursively. By doing so, our method offsets illumination and upsamples facial details progressively in a coarse-to-fine fashion, thus alleviating the ambiguity of correspondences between LR inputs and external guided inputs. Furthermore, a new illumination compensation loss is developed to capture illumination from the external guided face image effectively. Extensive experiments demonstrate that our method achieves authentic HR images in a uniform illumination condition with a 16x magnification factor and outperforms state-of-the-art methods qualitatively and quantitatively

    A multi-agent system for the classification of gender and age from images

    Get PDF
    [EN] The automatic classification of human images on the basis of age range and gender can be used in audiovisual content adaptation for Smart TVs or marquee advertising. Knowledge about users is used by publishing agencies and departments regulating TV content; on the basis of this information (age, gender) they are able to provide content that suits the interests of users. To this end, the creation of a highly precise image pattern recognition system is necessary, this may be one of the greatest challenges faced by computer technology in the last decades. These recognition systems must apply different pattern recognition techniques, in order to distinct gender and age in the images. In this work, we propose a multi-agent system that integrates different techniques for the acquisition, preprocessing and processing of images for the classification of age and gender. The system has been tested in an office building. Thanks to the use of a multi-agent system which allows to apply different workflows simultaneously, the performance of different methods could be compared (each flow with a different configuration). Experimental results have confirmed that a good preprocessing stage is necessary if we want the classification methods to perform well (Fisherfaces, Eigenfaces, Local Binary Patterns, Multilayer perceptron). The Fisherfaces method has proved to be more effective than MLP and the training time was shorter. In terms of the classification of age, Fisherfaces offers the best results in comparison to the rest of the system’s classifiers. The use of filters has allowed to reduce dimensionality, as a result the workload was reduced, a great advantage in a system that performs classification in real time

    Aprendendo representações específicas para a face de cada pessoa

    Get PDF
    Orientadores: Alexandre Xavier Falcão, Anderson de Rezende RochaTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Os seres humanos são especialistas natos em reconhecimento de faces, com habilidades que excedem em muito as dos métodos automatizados vigentes, especialmente em cenários não controlados, onde não há a necessidade de colaboração por parte do indivíduo sendo reconhecido. No entanto, uma característica marcante do reconhecimento de face humano é que nós somos substancialmente melhores no reconhecimento de faces familiares, provavelmente porque somos capazes de consolidar uma grande quantidade de experiência prévia com a aparência de certo indivíduo e de fazer uso efetivo dessa experiência para nos ajudar no reconhecimento futuro. De fato, pesquisadores em psicologia têm até mesmo sugeridos que a representação interna que fazemos das faces pode ser parcialmente adaptada ou otimizada para rostos familiares. Enquanto isso, a situação análoga no reconhecimento facial automatizado | onde um grande número de exemplos de treinamento de um indivíduo está disponível | tem sido muito pouco explorada, apesar da crescente relevância dessa abordagem na era das mídias sociais. Inspirados nessas observações, nesta tese propomos uma abordagem em que a representação da face de cada pessoa é explicitamente adaptada e realçada com o intuito de reconhecê-la melhor. Apresentamos uma coleção de métodos de aprendizado que endereça e progressivamente justifica tal abordagem. Ao aprender e operar com representações específicas para face de cada pessoa, nós somos capazes de consistentemente melhorar o poder de reconhecimento dos nossos algoritmos. Em particular, nós obtemos resultados no estado da arte na base de dados PubFig83, uma desafiadora coleção de imagens instituída e tornada pública com o objetivo de promover o estudo do reconhecimento de faces familiares. Nós sugerimos que o aprendizado de representações específicas para face de cada pessoa introduz uma forma intermediária de regularização ao problema de aprendizado, permitindo que os classificadores generalizem melhor através do uso de menos |, porém mais relevantes | características faciaisAbstract: Humans are natural face recognition experts, far outperforming current automated face recognition algorithms, especially in naturalistic, \in-the-wild" settings. However, a striking feature of human face recognition is that we are dramatically better at recognizing highly familiar faces, presumably because we can leverage large amounts of past experience with the appearance of an individual to aid future recognition. Researchers in psychology have even suggested that face representations might be partially tailored or optimized for familiar faces. Meanwhile, the analogous situation in automated face recognition, where a large number of training examples of an individual are available, has been largely underexplored, in spite of the increasing relevance of this setting in the age of social media. Inspired by these observations, we propose to explicitly learn enhanced face representations on a per-individual basis, and we present a collection of methods enabling this approach and progressively justifying our claim. By learning and operating within person-specific representations of faces, we are able to consistently improve performance on both the constrained and the unconstrained face recognition scenarios. In particular, we achieve state-of-the-art performance on the challenging PubFig83 familiar face recognition benchmark. We suggest that such person-specific representations introduce an intermediate form of regularization to the problem, allowing the classifiers to generalize better through the use of fewer | but more relevant | face featuresDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Emotion Recognition by Video: A review

    Full text link
    Video emotion recognition is an important branch of affective computing, and its solutions can be applied in different fields such as human-computer interaction (HCI) and intelligent medical treatment. Although the number of papers published in the field of emotion recognition is increasing, there are few comprehensive literature reviews covering related research on video emotion recognition. Therefore, this paper selects articles published from 2015 to 2023 to systematize the existing trends in video emotion recognition in related studies. In this paper, we first talk about two typical emotion models, then we talk about databases that are frequently utilized for video emotion recognition, including unimodal databases and multimodal databases. Next, we look at and classify the specific structure and performance of modern unimodal and multimodal video emotion recognition methods, talk about the benefits and drawbacks of each, and then we compare them in detail in the tables. Further, we sum up the primary difficulties right now looked by video emotion recognition undertakings and point out probably the most encouraging future headings, such as establishing an open benchmark database and better multimodal fusion strategys. The essential objective of this paper is to assist scholarly and modern scientists with keeping up to date with the most recent advances and new improvements in this speedy, high-influence field of video emotion recognition

    QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios

    Get PDF
    The concerns about individuals security have justified the increasing number of surveillance cameras deployed both in private and public spaces. However, contrary to popular belief, these devices are in most cases used solely for recording, instead of feeding intelligent analysis processes capable of extracting information about the observed individuals. Thus, even though video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant details about the subjects that took part in a crime depends on the manual inspection of recordings. As such, the current goal of the research community is the development of automated surveillance systems capable of monitoring and identifying subjects in surveillance scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at designing a visual surveillance system capable of acquiring biometric data at a distance (e.g., face, iris or gait) without requiring human intervention in the process, as well as devising biometric recognition methods robust to the degradation factors resulting from the unconstrained acquisition process. Regarding the first goal, the analysis of the data acquired by typical surveillance systems shows that large acquisition distances significantly decrease the resolution of biometric samples, and thus their discriminability is not sufficient for recognition purposes. In the literature, diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring high-resolution imagery at a distance, particularly when using a master-slave configuration. In the master-slave configuration, the video acquired by a typical surveillance camera is analyzed for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged at high-resolution by the PTZ camera. Several methods have already shown that this configuration can be used for acquiring biometric data at a distance. Nevertheless, these methods failed at providing effective solutions to the typical challenges of this strategy, restraining its use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development of a biometric data acquisition system based on the cooperation of a PTZ camera with a typical surveillance camera. The first proposal is a camera calibration method capable of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ camera. The second proposal is a camera scheduling method for determining - in real-time - the sequence of acquisitions that maximizes the number of different targets obtained, while minimizing the cumulative transition time. In order to achieve the first goal of this thesis, both methods were combined with state-of-the-art approaches of the human monitoring field to develop a fully automated surveillance capable of acquiring biometric data at a distance and without human cooperation, designated as QUIS-CAMPI system. The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis of the performance of the state-of-the-art biometric recognition approaches shows that these approaches attain almost ideal recognition rates in unconstrained data. However, this performance is incongruous with the recognition rates observed in surveillance scenarios. Taking into account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a distance ranging from 5 to 40 meters and without human intervention in the acquisition process. This set allows to objectively assess the performance of state-of-the-art biometric recognition methods in data that truly encompass the covariates of surveillance scenarios. As such, this set was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained by the nine methods specially designed for this competition. In addition, the data acquired by the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the development of methods robust to the covariates of surveillance scenarios. The first proposal regards a method for detecting corrupted features in biometric signatures inferred by a redundancy analysis algorithm. The second proposal is a caricature-based face recognition approach capable of enhancing the recognition performance by automatically generating a caricature from a 2D photo. The experimental evaluation of these methods shows that both approaches contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivíduos tem justificado o crescimento do número de câmaras de vídeo-vigilância instaladas tanto em espaços privados como públicos. Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente capaz de inferir em tempo real informações sobre os indivíduos observados. Assim, apesar de a vídeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda confinado à disponibilização de vídeos que têm que ser manualmente inspecionados para extrair informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal desafio da comunidade científica é o desenvolvimento de sistemas automatizados capazes de monitorizar e identificar indivíduos em ambientes de vídeo-vigilância. Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento biométrico aos ambientes de vídeo-vigilância. De forma mais especifica, pretende-se 1) conceber um sistema de vídeo-vigilância que consiga adquirir dados biométricos a longas distâncias (e.g., imagens da cara, íris, ou vídeos do tipo de passo) sem requerer a cooperação dos indivíduos no processo; e 2) desenvolver métodos de reconhecimento biométrico robustos aos fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas. No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas típicos de vídeo-vigilância mostra que, devido à distância de captura, os traços biométricos amostrados não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis. Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave. Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse (e.g. carros, pessoas) a partir do vídeo adquirido por uma câmara de vídeo-vigilância e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados com esta estratégia, impedindo assim o seu uso em ambientes de vídeo-vigilância. Deste modo, esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de vídeo-vigilância usando uma câmara PTZ assistida por uma câmara típica de vídeo-vigilância. O primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara master para o ângulo da câmara PTZ (slave) sem o auxílio de outros dispositivos óticos. O segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações que maximiza o número de diferentes sujeitos observados e simultaneamente minimiza o tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os dois métodos propostos foram combinados com os avanços alcançados na área da monitorização de humanos para assim desenvolver o primeiro sistema de vídeo-vigilância completamente automatizado e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação dos indivíduos no processo, designado por sistema QUIS-CAMPI. O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho não é corroborado pelos resultados observados em ambientes de vídeo-vigilância, o que sugere que os conjuntos de dados atuais não contêm verdadeiramente os fatores de degradação típicos dos ambientes de vídeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biométricos atuais, esta tese introduz um novo conjunto de dados biométricos (imagens da face e vídeos do tipo de passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho dos métodos do estado-da-arte no reconhecimento de indivíduos em imagens/vídeos capturados num ambiente real de vídeo-vigilância. Como tal, este conjunto foi utilizado para promover a primeira competição de reconhecimento biométrico em ambientes não controlados. Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9 métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para aumentar a robustez aos fatores de degradação observados em ambientes de vídeo-vigilância. O primeiro é um método para detetar características corruptas em assinaturas biométricas através da análise da redundância entre subconjuntos de características. O segundo é um método de reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma única foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir as taxas de erro em dados adquiridos de forma não controlada
    corecore