1,799 research outputs found
Triplet Distillation for Deep Face Recognition
Convolutional neural networks (CNNs) have achieved a great success in face
recognition, which unfortunately comes at the cost of massive computation and
storage consumption. Many compact face recognition networks are thus proposed
to resolve this problem. Triplet loss is effective to further improve the
performance of those compact models. However, it normally employs a fixed
margin to all the samples, which neglects the informative similarity structures
between different identities. In this paper, we propose an enhanced version of
triplet loss, named triplet distillation, which exploits the capability of a
teacher model to transfer the similarity information to a small model by
adaptively varying the margin between positive and negative pairs. Experiments
on LFW, AgeDB, and CPLFW datasets show the merits of our method compared to the
original triplet loss.Comment: 5 pages, 2 tables, accpeted by ICML 2019 ODML-CDNNR Worksho
Learning Metrics from Teachers: Compact Networks for Image Embedding
Metric learning networks are used to compute image embeddings, which are
widely used in many applications such as image retrieval and face recognition.
In this paper, we propose to use network distillation to efficiently compute
image embeddings with small networks. Network distillation has been
successfully applied to improve image classification, but has hardly been
explored for metric learning. To do so, we propose two new loss functions that
model the communication of a deep teacher network to a small student network.
We evaluate our system in several datasets, including CUB-200-2011, Cars-196,
Stanford Online Products and show that embeddings computed using small student
networks perform significantly better than those computed using standard
networks of similar size. Results on a very compact network (MobileNet-0.25),
which can be used on mobile devices, show that the proposed method can greatly
improve Recall@1 results from 27.5\% to 44.6\%. Furthermore, we investigate
various aspects of distillation for embeddings, including hint and attention
layers, semi-supervised learning and cross quality distillation. (Code is
available at https://github.com/yulu0724/EmbeddingDistillation.)Comment: To appear at CVPR 201
Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
Knowledge distillation is a potential solution for model compression. The
idea is to make a small student network imitate the target of a large teacher
network, then the student network can be competitive to the teacher one. Most
previous studies focus on model distillation in the classification task, where
they propose different architects and initializations for the student network.
However, only the classification task is not enough, and other related tasks
such as regression and retrieval are barely considered. To solve the problem,
in this paper, we take face recognition as a breaking point and propose model
distillation with knowledge transfer from face classification to alignment and
verification. By selecting appropriate initializations and targets in the
knowledge transfer, the distillation can be easier in non-classification tasks.
Experiments on the CelebA and CASIA-WebFace datasets demonstrate that the
student network can be competitive to the teacher one in alignment and
verification, and even surpasses the teacher network under specific compression
rates. In addition, to achieve stronger knowledge transfer, we also use a
common initialization trick to improve the distillation performance of
classification. Evaluations on the CASIA-Webface and large-scale MS-Celeb-1M
datasets show the effectiveness of this simple trick.Comment: 10 pages, 1 figure
Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
Person re-identification (ReID) is aimed at identifying the same person
across videos captured from different cameras. In the view that networks
extracting global features using ordinary network architectures are difficult
to extract local features due to their weak attention mechanisms, researchers
have proposed a lot of elaborately designed ReID networks, while greatly
improving the accuracy, the model size and the feature extraction latency are
also soaring. We argue that a relatively compact ordinary network extracting
globally pooled features has the capability to extract discriminative local
features and can achieve state-of-the-art precision if only the model's
parameters are properly learnt. In order to reduce the difficulty in learning
hard identity labels, we propose a novel knowledge distillation method:
Factorized Distillation, which factorizes both feature maps and retrieval
features of holistic ReID network to mimic representations of multiple partial
ReID models, thus transferring the knowledge from partial ReID models to the
holistic network. Experiments show that the performance of model trained with
the proposed method can outperform state-of-the-art with relatively few network
parameters.Comment: 10 pages, 5 figure
Towards Learning a Universal Non-Semantic Representation of Speech
The ultimate goal of transfer learning is to reduce labeled data requirements
by exploiting a pre-existing embedding model trained for different datasets or
tasks. The visual and language communities have established benchmarks to
compare embeddings, but the speech community has yet to do so. This paper
proposes a benchmark for comparing speech representations on non-semantic
tasks, and proposes a representation based on an unsupervised triplet-loss
objective. The proposed representation outperforms other representations on the
benchmark, and even exceeds state-of-the-art performance on a number of
transfer learning tasks. The embedding is trained on a publicly available
dataset, and it is tested on a variety of low-resource downstream tasks,
including personalization tasks and medical domain. The benchmark, models, and
evaluation code are publicly released
Learning Unified Embedding for Apparel Recognition
In apparel recognition, specialized models (e.g. models trained for a
particular vertical like dresses) can significantly outperform general models
(i.e. models that cover a wide range of verticals). Therefore, deep neural
network models are often trained separately for different verticals. However,
using specialized models for different verticals is not scalable and expensive
to deploy. This paper addresses the problem of learning one unified embedding
model for multiple object verticals (e.g. all apparel classes) without
sacrificing accuracy. The problem is tackled from two aspects: training data
and training difficulty. On the training data aspect, we figure out that for a
single model trained with triplet loss, there is an accuracy sweet spot in
terms of how many verticals are trained together. To ease the training
difficulty, a novel learning scheme is proposed by using the output from
specialized models as learning targets so that L2 loss can be used instead of
triplet loss. This new loss makes the training easier and make it possible for
more efficient use of the feature space. The end result is a unified model
which can achieve the same retrieval accuracy as a number of separate
specialized models, while having the model complexity as one. The effectiveness
of our approach is shown in experiments.Comment: 8 page
MarginDistillation: distillation for margin-based softmax
The usage of convolutional neural networks (CNNs) in conjunction with a
margin-based softmax approach demonstrates a state-of-the-art performance for
the face recognition problem. Recently, lightweight neural network models
trained with the margin-based softmax have been introduced for the face
identification task for edge devices. In this paper, we propose a novel
distillation method for lightweight neural network architectures that
outperforms other known methods for the face recognition task on LFW, AgeDB-30
and Megaface datasets. The idea of the proposed method is to use class centers
from the teacher network for the student network. Then the student network is
trained to get the same angles between the class centers and the face
embeddings, predicted by the teacher network
Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition
Recognizing visual relationships among any pair of
localized objects is pivotal for image understanding. Previous studies have
shown remarkable progress in exploiting linguistic priors or external textual
information to improve the performance. In this work, we investigate an
orthogonal perspective based on feature interactions. We show that by
encouraging deep message propagation and interactions between local object
features and global predicate features, one can achieve compelling performance
in recognizing complex relationships without using any linguistic priors. To
this end, we present two new pooling cells to encourage feature interactions:
(i) Contrastive ROI Pooling Cell, which has a unique deROI pooling that
inversely pools local object features to the corresponding area of global
predicate features. (ii) Pyramid ROI Pooling Cell, which broadcasts global
predicate features to reinforce local object features.The two cells constitute
a Spatiality-Context-Appearance Module (SCA-M), which can be further stacked
consecutively to form our final Zoom-Net.We further shed light on how one could
resolve ambiguous and noisy object and predicate annotations by
Intra-Hierarchical trees (IH-tree). Extensive experiments conducted on Visual
Genome dataset demonstrate the effectiveness of our feature-oriented approach
compared to state-of-the-art methods (Acc@1 11.42% from 8.16%) that depend on
explicit modeling of linguistic interactions. We further show that SCA-M can be
incorporated seamlessly into existing approaches to improve the performance by
a large margin. The source code will be released on
https://github.com/gjyin91/ZoomNet.Comment: 22 pages, 9 figures, accepted by ECCV 2018, the source code will be
released on https://github.com/gjyin91/ZoomNe
In Defense of the Triplet Loss Again: Learning Robust Person Re-Identification with Fast Approximated Triplet Loss and Label Distillation
The comparative losses (typically, triplet loss) are appealing choices for
learning person re-identification (ReID) features. However, the triplet loss is
computationally much more expensive than the (practically more popular)
classification loss, limiting their wider usage in massive datasets. Moreover,
the abundance of label noise and outliers in ReID datasets may also put the
margin-based loss in jeopardy. This work addresses the above two shortcomings
of triplet loss, extending its effectiveness to large-scale ReID datasets with
potentially noisy labels. We propose a fast-approximated triplet (FAT) loss,
which provably converts the point-wise triplet loss into its upper bound form,
consisting of a point-to-set loss term plus cluster compactness regularization.
It preserves the effectiveness of triplet loss, while leading to linear
complexity to the training set size. A label distillation strategy is further
designed to learn refined soft-labels in place of the potentially noisy labels,
from only an identified subset of confident examples, through teacher-student
networks. We conduct extensive experiments on three most popular ReID
benchmarks (Market-1501, DukeMTMC-reID, and MSMT17), and demonstrate that FAT
loss with distilled labels lead to ReID features with remarkable accuracy,
efficiency, robustness, and direct transferability to unseen datasets
Improving Face Recognition from Hard Samples via Distribution Distillation Loss
Large facial variations are the main challenge in face recognition. To this
end, previous variation-specific methods make full use of task-related prior to
design special network losses, which are typically not general among different
tasks and scenarios. In contrast, the existing generic methods focus on
improving the feature discriminability to minimize the intra-class distance
while maximizing the interclass distance, which perform well on easy samples
but fail on hard samples. To improve the performance on those hard samples for
general tasks, we propose a novel Distribution Distillation Loss to narrow the
performance gap between easy and hard samples, which is a simple, effective and
generic for various types of facial variations. Specifically, we first adopt
state-of-the-art classifiers such as ArcFace to construct two similarity
distributions: teacher distribution from easy samples and student distribution
from hard samples. Then, we propose a novel distribution-driven loss to
constrain the student distribution to approximate the teacher distribution,
which thus leads to smaller overlap between the positive and negative pairs in
the student distribution. We have conducted extensive experiments on both
generic large-scale face benchmarks and benchmarks with diverse variations on
race, resolution and pose. The quantitative results demonstrate the superiority
of our method over strong baselines, e.g., Arcface and Cosface.Comment: ECCV202
- …