46,978 research outputs found
Circle Loss: A Unified Perspective of Pair Similarity Optimization
This paper provides a pair similarity optimization viewpoint on deep feature
learning, aiming to maximize the within-class similarity and minimize the
between-class similarity . We find a majority of loss functions, including
the triplet loss and the softmax plus cross-entropy loss, embed and
into similarity pairs and seek to reduce . Such an optimization
manner is inflexible, because the penalty strength on every single similarity
score is restricted to be equal. Our intuition is that if a similarity score
deviates far from the optimum, it should be emphasized. To this end, we simply
re-weight each similarity to highlight the less-optimized similarity scores. It
results in a Circle loss, which is named due to its circular decision boundary.
The Circle loss has a unified formula for two elemental deep feature learning
approaches, i.e. learning with class-level labels and pair-wise labels.
Analytically, we show that the Circle loss offers a more flexible optimization
approach towards a more definite convergence target, compared with the loss
functions optimizing . Experimentally, we demonstrate the
superiority of the Circle loss on a variety of deep feature learning tasks. On
face recognition, person re-identification, as well as several fine-grained
image retrieval datasets, the achieved performance is on par with the state of
the art
Super-Identity Convolutional Neural Network for Face Hallucination
Face hallucination is a generative task to super-resolve the facial image
with low resolution while human perception of face heavily relies on identity
information. However, previous face hallucination approaches largely ignore
facial identity recovery. This paper proposes Super-Identity Convolutional
Neural Network (SICNN) to recover identity information for generating faces
closed to the real identity. Specifically, we define a super-identity loss to
measure the identity difference between a hallucinated face and its
corresponding high-resolution face within the hypersphere identity metric
space. However, directly using this loss will lead to a Dynamic Domain
Divergence problem, which is caused by the large margin between the
high-resolution domain and the hallucination domain. To overcome this
challenge, we present a domain-integrated training approach by constructing a
robust identity metric for faces from these two domains. Extensive experimental
evaluations demonstrate that the proposed SICNN achieves superior visual
quality over the state-of-the-art methods on a challenging task to
super-resolve 1214 faces with an 8 upscaling factor. In
addition, SICNN significantly improves the recognizability of
ultra-low-resolution faces.Comment: Published in ECCV 201
Deep Learning Architectures for Face Recognition in Video Surveillance
Face recognition (FR) systems for video surveillance (VS) applications
attempt to accurately detect the presence of target individuals over a
distributed network of cameras. In video-based FR systems, facial models of
target individuals are designed a priori during enrollment using a limited
number of reference still images or video data. These facial models are not
typically representative of faces being observed during operations due to large
variations in illumination, pose, scale, occlusion, blur, and to camera
inter-operability. Specifically, in still-to-video FR application, a single
high-quality reference still image captured with still camera under controlled
conditions is employed to generate a facial model to be matched later against
lower-quality faces captured with video cameras under uncontrolled conditions.
Current video-based FR systems can perform well on controlled scenarios, while
their performance is not satisfactory in uncontrolled scenarios mainly because
of the differences between the source (enrollment) and the target (operational)
domains. Most of the efforts in this area have been toward the design of robust
video-based FR systems in unconstrained surveillance environments. This chapter
presents an overview of recent advances in still-to-video FR scenario through
deep convolutional neural networks (CNNs). In particular, deep learning
architectures proposed in the literature based on triplet-loss function (e.g.,
cross-correlation matching CNN, trunk-branch ensemble CNN and HaarNet) and
supervised autoencoders (e.g., canonical face representation CNN) are reviewed
and compared in terms of accuracy and computational complexity
Attacks on State-of-the-Art Face Recognition using Attentional Adversarial Attack Generative Network
With the broad use of face recognition, its weakness gradually emerges that
it is able to be attacked. So, it is important to study how face recognition
networks are subject to attacks. In this paper, we focus on a novel way to do
attacks against face recognition network that misleads the network to identify
someone as the target person not misclassify inconspicuously. Simultaneously,
for this purpose, we introduce a specific attentional adversarial attack
generative network to generate fake face images. For capturing the semantic
information of the target person, this work adds a conditional variational
autoencoder and attention modules to learn the instance-level correspondences
between faces. Unlike traditional two-player GAN, this work introduces face
recognition networks as the third player to participate in the competition
between generator and discriminator which allows the attacker to impersonate
the target person better. The generated faces which are hard to arouse the
notice of onlookers can evade recognition by state-of-the-art networks and most
of them are recognized as the target person
Targeting Ultimate Accuracy: Face Recognition via Deep Embedding
Face Recognition has been studied for many decades. As opposed to traditional
hand-crafted features such as LBP and HOG, much more sophisticated features can
be learned automatically by deep learning methods in a data-driven way. In this
paper, we propose a two-stage approach that combines a multi-patch deep CNN and
deep metric learning, which extracts low dimensional but very discriminative
features for face verification and recognition. Experiments show that this
method outperforms other state-of-the-art methods on LFW dataset, achieving
99.77% pair-wise verification accuracy and significantly better accuracy under
other two more practical protocols. This paper also discusses the importance of
data size and the number of patches, showing a clear path to practical
high-performance face recognition systems in real world
Learning with Batch-wise Optimal Transport Loss for 3D Shape Recognition
Deep metric learning is essential for visual recognition. The widely used
pair-wise (or triplet) based loss objectives cannot make full use of semantical
information in training samples or give enough attention to those hard samples
during optimization. Thus, they often suffer from a slow convergence rate and
inferior performance. In this paper, we show how to learn an importance-driven
distance metric via optimal transport programming from batches of samples. It
can automatically emphasize hard examples and lead to significant improvements
in convergence. We propose a new batch-wise optimal transport loss and combine
it in an end-to-end deep metric learning manner. We use it to learn the
distance metric and deep feature representation jointly for recognition.
Empirical results on visual retrieval and classification tasks with six
benchmark datasets, i.e., MNIST, CIFAR10, SHREC13, SHREC14, ModelNet10, and
ModelNet40, demonstrate the superiority of the proposed method. It can
accelerate the convergence rate significantly while achieving a
state-of-the-art recognition performance. For example, in 3D shape recognition
experiments, we show that our method can achieve better recognition performance
within only 5 epochs than what can be obtained by mainstream 3D shape
recognition approaches after 200 epochs.Comment: 10 pages, 4 figures Accepted by CVPR201
Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning
A family of loss functions built on pair-based computation have been proposed
in the literature which provide a myriad of solutions for deep metric learning.
In this paper, we provide a general weighting framework for understanding
recent pair-based loss functions. Our contributions are three-fold: (1) we
establish a General Pair Weighting (GPW) framework, which casts the sampling
problem of deep metric learning into a unified view of pair weighting through
gradient analysis, providing a powerful tool for understanding recent
pair-based loss functions; (2) we show that with GPW, various existing
pair-based methods can be compared and discussed comprehensively, with clear
differences and key limitations identified; (3) we propose a new loss called
multi-similarity loss (MS loss) under the GPW, which is implemented in two
iterative steps (i.e., mining and weighting). This allows it to fully consider
three similarities for pair weighting, providing a more principled approach for
collecting and weighting informative pairs. Finally, the proposed MS loss
obtains new state-of-the-art performance on four image retrieval benchmarks,
where it outperforms the most recent approaches, such as
ABE\cite{Kim_2018_ECCV} and HTL by a large margin: 60.6% to 65.7% on CUB200,
and 80.9% to 88.0% on In-Shop Clothes Retrieval dataset at Recall@1. Code is
available at https://github.com/MalongTech/research-ms-loss.Comment: Accepted CVPR 2019, rewrite main method to be more clea
Supervised Mixed Norm Autoencoder for Kinship Verification in Unconstrained Videos
Identifying kinship relations has garnered interest due to several
applications such as organizing and tagging the enormous amount of videos being
uploaded on the Internet. Existing research in kinship verification primarily
focuses on kinship prediction with image pairs. In this research, we propose a
new deep learning framework for kinship verification in unconstrained videos
using a novel Supervised Mixed Norm regularization Autoencoder (SMNAE). This
new autoencoder formulation introduces class-specific sparsity in the weight
matrix. The proposed three-stage SMNAE based kinship verification framework
utilizes the learned spatio-temporal representation in the video frames for
verifying kinship in a pair of videos. A new kinship video (KIVI) database of
more than 500 individuals with variations due to illumination, pose, occlusion,
ethnicity, and expression is collected for this research. It comprises a total
of 355 true kin video pairs with over 250,000 still frames. The effectiveness
of the proposed framework is demonstrated on the KIVI database and six existing
kinship databases. On the KIVI database, SMNAE yields video-based kinship
verification accuracy of 83.18% which is at least 3.2% better than existing
algorithms. The algorithm is also evaluated on six publicly available kinship
databases and compared with best-reported results. It is observed that the
proposed SMNAE consistently yields best results on all the databasesComment: Accepted for publication in Transactions in Image Processin
Deep Metric Learning with Angular Loss
The modern image search system requires semantic understanding of image, and
a key yet under-addressed problem is to learn a good metric for measuring the
similarity between images. While deep metric learning has yielded impressive
performance gains by extracting high level abstractions from image data, a
proper objective loss function becomes the central issue to boost the
performance. In this paper, we propose a novel angular loss, which takes angle
relationship into account, for learning better similarity metric. Whereas
previous metric learning methods focus on optimizing the similarity
(contrastive loss) or relative similarity (triplet loss) of image pairs, our
proposed method aims at constraining the angle at the negative point of triplet
triangles. Several favorable properties are observed when compared with
conventional methods. First, scale invariance is introduced, improving the
robustness of objective against feature variance. Second, a third-order
geometric constraint is inherently imposed, capturing additional local
structure of triplet triangles than contrastive loss or triplet loss. Third,
better convergence has been demonstrated by experiments on three publicly
available datasets.Comment: International Conference on Computer Vision 201
Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss
Deep supervised hashing has emerged as an influential solution to large-scale
semantic image retrieval problems in computer vision. In the light of recent
progress, convolutional neural network based hashing methods typically seek
pair-wise or triplet labels to conduct the similarity preserving learning.
However, complex semantic concepts of visual contents are hard to capture by
similar/dissimilar labels, which limits the retrieval performance. Generally,
pair-wise or triplet losses not only suffer from expensive training costs but
also lack in extracting sufficient semantic information. In this regard, we
propose a novel deep supervised hashing model to learn more compact class-level
similarity preserving binary codes. Our deep learning based model is motivated
by deep metric learning that directly takes semantic labels as supervised
information in training and generates corresponding discriminant hashing code.
Specifically, a novel cubic constraint loss function based on Gaussian
distribution is proposed, which preserves semantic variations while penalizes
the overlap part of different classes in the embedding space. To address the
discrete optimization problem introduced by binary codes, a two-step
optimization strategy is proposed to provide efficient training and avoid the
problem of gradient vanishing. Extensive experiments on four large-scale
benchmark databases show that our model can achieve the state-of-the-art
retrieval performance. Moreover, when training samples are limited, our method
surpasses other supervised deep hashing methods with non-negligible margins
- …