294,094 research outputs found
Cross-resolution Face Recognition via Identity-Preserving Network and Knowledge Distillation
Cross-resolution face recognition has become a challenging problem for modern
deep face recognition systems. It aims at matching a low-resolution probe image
with high-resolution gallery images registered in a database. Existing methods
mainly leverage prior information from high-resolution images by either
reconstructing facial details with super-resolution techniques or learning a
unified feature space. To address this challenge, this paper proposes a new
approach that enforces the network to focus on the discriminative information
stored in the low-frequency components of a low-resolution image. A
cross-resolution knowledge distillation paradigm is first employed as the
learning framework. Then, an identity-preserving network, WaveResNet, and a
wavelet similarity loss are designed to capture low-frequency details and boost
performance. Finally, an image degradation model is conceived to simulate more
realistic low-resolution training data. Consequently, extensive experimental
results show that the proposed method consistently outperforms the baseline
model and other state-of-the-art methods across a variety of image resolutions
Cross-Quality LFW: A Database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments
Real-world face recognition applications often deal with suboptimal image
quality or resolution due to different capturing conditions such as various
subject-to-camera distances, poor camera settings, or motion blur. This
characteristic has an unignorable effect on performance. Recent
cross-resolution face recognition approaches used simple, arbitrary, and
unrealistic down- and up-scaling techniques to measure robustness against
real-world edge-cases in image quality. Thus, we propose a new standardized
benchmark dataset and evaluation protocol derived from the famous Labeled Faces
in the Wild (LFW). In contrast to previous derivatives, which focus on pose,
age, similarity, and adversarial attacks, our Cross-Quality Labeled Faces in
the Wild (XQLFW) maximizes the quality difference. It contains only more
realistic synthetically degraded images when necessary. Our proposed dataset is
then used to further investigate the influence of image quality on several
state-of-the-art approaches. With XQLFW, we show that these models perform
differently in cross-quality cases, and hence, the generalizing capability is
not accurately predicted by their performance on LFW. Additionally, we report
baseline accuracy with recent deep learning models explicitly trained for
cross-resolution applications and evaluate the susceptibility to image quality.
To encourage further research in cross-resolution face recognition and incite
the assessment of image quality robustness, we publish the database and code
for evaluation.Comment: 9 pages, 4 figures, 2 table
Generative Adversarial Network and Its Application in Aerial Vehicle Detection and Biometric Identification System
In recent years, generative adversarial networks (GANs) have shown great potential in advancing the state-of-the-art in many areas of computer vision, most notably in image synthesis and manipulation tasks. GAN is a generative model which simultaneously trains a generator and a discriminator in an adversarial manner to produce real-looking synthetic data by capturing the underlying data distribution. Due to its powerful ability to generate high-quality and visually pleasingresults, we apply it to super-resolution and image-to-image translation techniques to address vehicle detection in low-resolution aerial images and cross-spectral cross-resolution iris recognition. First, we develop a Multi-scale GAN (MsGAN) with multiple intermediate outputs, which progressively learns the details and features of the high-resolution aerial images at different scales. Then the upscaled super-resolved aerial images are fed to a You Only Look Once-version 3 (YOLO-v3) object detector and the detection loss is jointly optimized along with a super-resolution loss to emphasize target vehicles sensitive to the super-resolution process. There is another problem that remains unsolved when detection takes place at night or in a dark environment, which requires an IR detector. Training such a detector needs a lot of infrared (IR) images. To address these challenges, we develop a GAN-based joint cross-modal super-resolution framework where low-resolution (LR) IR images are translated and super-resolved to high-resolution (HR) visible (VIS) images before applying detection. This approach significantly improves the accuracy of aerial vehicle detection by leveraging the benefits of super-resolution techniques in a cross-modal domain. Second, to increase the performance and reliability of deep learning-based biometric identification systems, we focus on developing conditional GAN (cGAN) based cross-spectral cross-resolution iris recognition and offer two different frameworks. The first approach trains a cGAN to jointly translate and super-resolve LR near-infrared (NIR) iris images to HR VIS iris images to perform cross-spectral cross-resolution iris matching to the same resolution and within the same spectrum. In the second approach, we design a coupled GAN (cpGAN) architecture to project both VIS and NIR iris images into a low-dimensional embedding domain. The goal of this architecture is to ensure maximum pairwise similarity between the feature vectors from the two iris modalities of the same subject. We have also proposed a pose attention-guided coupled profile-to-frontal face recognition network to learn discriminative and pose-invariant features in an embedding subspace. To show that the feature vectors learned by this deep subspace can be used for other tasks beyond recognition, we implement a GAN architecture which is able to reconstruct a frontal face from its corresponding profile face. This capability can be used in various face analysis tasks, such as emotion detection and expression tracking, where having a frontal face image can improve accuracy and reliability. Overall, our research works have shown its efficacy by achieving new state-of-the-art results through extensive experiments on publicly available datasets reported in the literature
CCFace: Classification Consistency for Low-Resolution Face Recognition
In recent years, deep face recognition methods have demonstrated impressive
results on in-the-wild datasets. However, these methods have shown a
significant decline in performance when applied to real-world low-resolution
benchmarks like TinyFace or SCFace. To address this challenge, we propose a
novel classification consistency knowledge distillation approach that transfers
the learned classifier from a high-resolution model to a low-resolution
network. This approach helps in finding discriminative representations for
low-resolution instances. To further improve the performance, we designed a
knowledge distillation loss using the adaptive angular penalty inspired by the
success of the popular angular margin loss function. The adaptive penalty
reduces overfitting on low-resolution samples and alleviates the convergence
issue of the model integrated with data augmentation. Additionally, we utilize
an asymmetric cross-resolution learning approach based on the state-of-the-art
semi-supervised representation learning paradigm to improve discriminability on
low-resolution instances and prevent them from forming a cluster. Our proposed
method outperforms state-of-the-art approaches on low-resolution benchmarks,
with a three percent improvement on TinyFace while maintaining performance on
high-resolution benchmarks.Comment: 2023 IEEE International Joint Conference on Biometrics (IJCB
Person Recognition in Low-Quality Imagery.
PhD thesesPerson recognition aims to recognise and track the same individuals over space and time with
subtle identity class information in automatically detected person images captured by unconstrained
camera views. There are multi-source visual biometrical cues for person identity recognition.
Specifically, compared to other widely-used cues that tend to easily change over time and
space, the facial appearance is considered as a more reliable non-intrusive visual cue. Person
recognition, especially the person face recognition, enables a wide range of practical applications,
ranging from law enforcement and information security to business, entertainment and
e-commerce. However, person recognition under realistic application scenarios remains significantly
challenging, mainly due to the usual low resolutions (LR) of the images captured by
low-quality cameras with unconstrained distances between cameras and people. Compared to
the high-resolution (HR) images, the LR person images contain much less fine-grained discriminative
details for robust identity recognition. To tackle the challenge of person recognition on
low-resolution imagery data, one effective approach is to utilise the super resolution (SR) methods
to recover or enhance the image details that are beneficial for identity recognition. However,
this thesis reveals that conventional SR models suffer from significant performance drop when
applied to low-quality LR person images, especially the natively captured surveillance facial
images. Moreover, as the SR and identity recognition models advance independently, direct super
resolution is less compatible with identity recognition, and hence has minor benefit or even
negative effect for low-resolution person recognition.
To tackle the above problems, this thesis explores person recognition methods with improved
generalisation ability to realistic low-quality person images, by adopting dedicated superresolution
algorithms. More specifically, this thesis addresses the issues for person face recognition
and body recognition in low-resolution images as follows:
Chapter 3 Whilst recent person face recognition techniques have made significant progress
on recognising constrained high-resolution web images, the same cannot be said on natively
unconstrained low-resolution images at large scales. This chapter examines systematically this
under-studied person face recognition problem, and introduce a novel Complement Super-Resolution
and Identity (CSRI) joint deep learning method with a unified end-to-end network architecture.
The proposed learning mechanism is dedicated to overcome the inherent challenge of genuine
low-resolution, concerning with the absence of HR facial images coupled with native LR faces,
typically required for optimising image super-resolution models. This is realised by transferring
the super-resolving knowledge from good-quality HR web images to the genuine LR facial
data subject to the face identity label constraints of native LR faces in every mini-batch training.
This chapter further constructs a new large-scale dataset TinyFace of native unconstrained
low-resolution face images from selected public datasets. The extensive experiments show that
there is a significant gap between the reported person face recognition performances on popular
benchmarks and the results on TinyFace, and the advantages of the proposed CSRI over a variety
of state-of-the-art face recognition and super-resolution deep models on solving this largely ignored
person face recognition scenario. However, the lack of supervision in pixel space leads to
the low-fidelity super-resolved images. which may hinder the further downstream facial analysis
applications.
Chapter 4 Although with a more advanced joint-learning scheme for person face recognition
by super resolution (introduced in Chapter 3), by no-means one can claim that the proposed
method solves the real-world low-resolution face recognition problem, which remains a
significantly challenging task. In terms of human understanding, when people are faced with a
challenging face identity recognition task, they often make decisions by selecting discriminative
facial features. If a recognition model can be optimised with results that can be explained in
a human-understandable way, such an interpretable model may have the potential to shed light
on discriminative facial features selection for better identity recognition. To achieve this, recognising
faces from high-fidelity super-resolved outputs could be a viable approach. However,
existing facial super-resolution methods focus mostly on improving “artificially down-sampled”
low-resolution (LR) imagery. Such SR models, although strong at handling artificial LR images,
often suffer from significant performance drop on genuine LR test data. Previous unsupervised
domain adaptation (UDA) methods address this issue by training a model using unpaired genuine
LR and HR data as well as cycle consistency loss formulation. However, this renders the model
overstretched with two tasks: consistifying the visual characteristics and enhancing the image
resolution. Importantly, this makes the end-to-end model training ineffective due to the difficulty
of back-propagating gradients through two concatenated CNNs. To solve this problem, in this
chapter, a method that joins the advantages of conventional SR and UDA models is formulated.
Specifically, the optimisations for characteristics consistifying and image super-resolving are
separated and controlled by introducing Characteristic Regularisation (CR) between them. This
task split makes the model training more effective and computationally tractable, and enables the
high-fidelity super resolution process on genuine low-resolution faces.
Chapter 5 Although the facial appearance is a more reliable visual cue for person recognition,
it is often challenging or even impossible to detect the facial region in images captured by
unconstrained low-quality cameras, where the faces can be of extreme poses, blur, distortion, or
even invisible in the human back-view images. In such cases, the person body recognition is
an important aspect for identity recognition and tracking. However, person images captured by
unconstrained surveillance cameras often have low resolutions (LR). This causes the resolution
mismatch problem when matched against the high-resolution (HR) gallery images, negatively
affecting the performance of person body recognition. An effective approach is to leverage image
super-resolution (SR) along with body recognition in a joint learning manner. However, this
scheme is limited due to dramatically more difficult gradients backpropagation during training.
This chapter introduces a novel model training regularisation method, called Inter-Task Association
Critic (INTACT), to address this fundamental problem. Specifically, INTACT discovers the
underlying association knowledge between image SR and person body recognition, and leverages
it as an extra learning constraint for enhancing the compatibility of SR model with person body
recognition in HR image space. This is realised by parameterising the association constraint,
which can be automatically learned from the training data. Extensive experiments validate the
superiority of INTACT over the state-of-the-art approaches on the cross-resolution person body
recognition task using five standard datasets.
Chapter 6 draws conclusions and suggests future works on open questions arising from the
studies of this thesis
- …