2,556 research outputs found

    Bias in Deep Learning and Applications to Face Analysis

    Get PDF
    Deep learning has fostered the progress in the field of face analysis, resulting in the integration of these models in multiple aspects of society. Even though the majority of research has focused on optimizing standard evaluation metrics, recent work has exposed the bias of such algorithms as well as the dangers of their unaccountable utilization.n this thesis, we explore the bias of deep learning models in the discriminative and the generative setting. We begin by investigating the bias of face analysis models with regards to different demographics. To this end, we collect KANFace, a large-scale video and image dataset of faces captured ``in-the-wild’'. The rich set of annotations allows us to expose the demographic bias of deep learning models, which we mitigate by utilizing adversarial learning to debias the deep representations. Furthermore, we explore neural augmentation as a strategy towards training fair classifiers. We propose a style-based multi-attribute transfer framework that is able to synthesize photo-realistic faces of the underrepresented demographics. This is achieved by introducing a multi-attribute extension to Adaptive Instance Normalisation that captures the multiplicative interactions between the representations of different attributes. Focusing on bias in gender recognition, we showcase the efficacy of the framework in training classifiers that are more fair compared to generative and fairness-aware methods.In the second part, we focus on bias in deep generative models. In particular, we start by studying the generalization of generative models on images of unseen attribute combinations. To this end, we extend the conditional Variational Autoencoder by introducing a multilinear conditioning framework. The proposed method is able to synthesize unseen attribute combinations by modeling the multiplicative interactions between the attributes. Lastly, in order to control protected attributes, we investigate controlled image generation without training on a labelled dataset. We leverage pre-trained Generative Adversarial Networks that are trained in an unsupervised fashion and exploit the clustering that occurs in the representation space of intermediate layers of the generator. We show that these clusters capture semantic attribute information and condition image synthesis on the cluster assignment using Implicit Maximum Likelihood Estimation.Open Acces

    Similarity learning for person re-identification and semantic video retrieval

    Full text link
    Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval. Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art. Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos. Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00

    Similarity learning for person re-identification and semantic video retrieval

    Full text link
    Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval. Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art. Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos. Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00

    Deep Visual Unsupervised Domain Adaptation for Classification Tasks:A Survey

    Get PDF

    Fusion features ensembling models using Siamese convolutional neural network for kinship verification

    Get PDF
    Family is one of the most important entities in the community. Mining the genetic information through facial images is increasingly being utilized in wide range of real-world applications to facilitate family members tracing and kinship analysis to become remarkably easy, inexpensive, and fast as compared to the procedure of profiling Deoxyribonucleic acid (DNA). However, the opportunities of building reliable models for kinship recognition are still suffering from the insufficient determination of the familial features, unstable reference cues of kinship, and the genetic influence factors of family features. This research proposes enhanced methods for extracting and selecting the effective familial features that could provide evidences of kinship leading to improve the kinship verification accuracy through visual facial images. First, the Convolutional Neural Network based on Optimized Local Raw Pixels Similarity Representation (OLRPSR) method is developed to improve the accuracy performance by generating a new matrix representation in order to remove irrelevant information. Second, the Siamese Convolutional Neural Network and Fusion of the Best Overlapping Blocks (SCNN-FBOB) is proposed to track and identify the most informative kinship clues features in order to achieve higher accuracy. Third, the Siamese Convolutional Neural Network and Ensembling Models Based on Selecting Best Combination (SCNN-EMSBC) is introduced to overcome the weak performance of the individual image and classifier. To evaluate the performance of the proposed methods, series of experiments are conducted using two popular benchmarking kinship databases; the KinFaceW-I and KinFaceW-II which then are benchmarked against the state-of-art algorithms found in the literature. It is indicated that SCNN-EMSBC method achieves promising results with the average accuracy of 92.42% and 94.80% on KinFaceW-I and KinFaceW-II, respectively. These results significantly improve the kinship verification performance and has outperformed the state-of-art algorithms for visual image-based kinship verification
    corecore