855 research outputs found
VoxCeleb2: Deep Speaker Recognition
The objective of this paper is speaker recognition under noisy and
unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale
audio-visual speaker recognition dataset collected from open-source media.
Using a fully automated pipeline, we curate VoxCeleb2 which contains over a
million utterances from over 6,000 speakers. This is several times larger than
any publicly available speaker recognition dataset.
Second, we develop and compare Convolutional Neural Network (CNN) models and
training strategies that can effectively recognise identities from voice under
various conditions. The models trained on the VoxCeleb2 dataset surpass the
performance of previous works on a benchmark dataset by a significant margin.Comment: To appear in Interspeech 2018. The audio-visual dataset can be
downloaded from http://www.robots.ox.ac.uk/~vgg/data/voxceleb2 .
1806.05622v2: minor fixes; 5 page
Recent Advances in Deep Learning Techniques for Face Recognition
In recent years, researchers have proposed many deep learning (DL) methods
for various tasks, and particularly face recognition (FR) made an enormous leap
using these techniques. Deep FR systems benefit from the hierarchical
architecture of the DL methods to learn discriminative face representation.
Therefore, DL techniques significantly improve state-of-the-art performance on
FR systems and encourage diverse and efficient real-world applications. In this
paper, we present a comprehensive analysis of various FR systems that leverage
the different types of DL techniques, and for the study, we summarize 168
recent contributions from this area. We discuss the papers related to different
algorithms, architectures, loss functions, activation functions, datasets,
challenges, improvement ideas, current and future trends of DL-based FR
systems. We provide a detailed discussion of various DL methods to understand
the current state-of-the-art, and then we discuss various activation and loss
functions for the methods. Additionally, we summarize different datasets used
widely for FR tasks and discuss challenges related to illumination, expression,
pose variations, and occlusion. Finally, we discuss improvement ideas, current
and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep
Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp.
99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613
Towards multi-modal face recognition in the wild
Face recognition aims at utilizing the facial appearance for the identification or verification of human individuals, and has been one of the fundamental research areas in computer vision. Over the past a few decades, face recognition has drawn significant attention due to its potential use in biometric authentication, surveillance, security, robotics and so on. Many existing face recognition methods are evaluated with faces collected in labs, and does not generalize well in reality. Compared with faces captured in labs, faces in the wild are inherently multi-modal distributed. The multi-modality issue leads to significant intra-class variations, and usually requires a large amount of labeled samples to cover the wide range of modalities. These difficulties make unconstrained face recognition even more challenging, and pose a considerable gap between laboratorial research and industrial practice. To bridge the gap, we set focus on multi-modal face recognition in the unconstrained environment in this thesis.
This thesis introduces several approaches to address the aforementioned specific challenges. Accordingly, the approaches included can be generally categorized into two research directions. The first direction explores a series of deep learning based methods in handling the large intra-class variations in multi-modal face recognition. The combination of modalities in the wild is unpredictable, and thus is difficult to explicitly define in advance. It is desirable to design a framework adaptive to the modality-driven variations in the specific scenarios. To this end, Deep Neural Network (DNN) is adopted as the basis, as DNN learns the feature representation and the classifier with reference to the specific target objective directly. To begin with, we aims to learn a part-based facial representation with deep neural networks to address face verification in the wild. In particular, the proposed framework consists of two deliberate components: a Deep Mixture Model (DMM) to find accurate patch correspondence and a Convolutional Fusion Network (CFN) to learn the fusion of multiple patch-specific facial features. This framework is specifically designed to handle local distortions caused by modalities such
as pose and illumination. The next work introduces the conditional partition of the sample space into deep learning to tackle face recognition with regard to modalities in a general sense. Without any prior knowledge of modality, the proposed network learns the hidden modalities of faces, based on which the initial sample space is partitioned so that modality-specific feature representation can be learnt accordingly. The other
direction is Semi-Supervised Learning with videos to tackle the deficiency of labeled training samples. In particular, a novel Semi-Supervised Learning strategy is proposed for the problem of celebrity identification by harvesting the “confident” unlabeled samples from the vast video sources. The video context information is adopted to iteratively enrich the diversity of the initial labeled set so that the performance of learnt classifier
can be gradually improved. In this thesis, all these works are evaluated with extensive experiments in the corresponding sections. The connection and difference among the three approaches are further discussed in the conclusion section.Open Acces
- …