112,748 research outputs found
A Discriminatively Learned CNN Embedding for Person Re-identification
We revisit two popular convolutional neural networks (CNN) in person
re-identification (re-ID), i.e, verification and classification models. The two
models have their respective advantages and limitations due to different loss
functions. In this paper, we shed light on how to combine the two models to
learn more discriminative pedestrian descriptors. Specifically, we propose a
new siamese network that simultaneously computes identification loss and
verification loss. Given a pair of training images, the network predicts the
identities of the two images and whether they belong to the same identity. Our
network learns a discriminative embedding and a similarity measurement at the
same time, thus making full usage of the annotations. Albeit simple, the
learned embedding improves the state-of-the-art performance on two public
person re-ID benchmarks. Further, we show our architecture can also be applied
in image retrieval
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
We have witnessed rapid evolution of deep neural network architecture design
in the past years. These latest progresses greatly facilitate the developments
in various areas such as computer vision and natural language processing.
However, along with the extraordinary performance, these state-of-the-art
models also bring in expensive computational cost. Directly deploying these
models into applications with real-time requirement is still infeasible.
Recently, Hinton etal. have shown that the dark knowledge within a powerful
teacher model can significantly help the training of a smaller and faster
student network. These knowledge are vastly beneficial to improve the
generalization ability of the student model. Inspired by their work, we
introduce a new type of knowledge -- cross sample similarities for model
compression and acceleration. This knowledge can be naturally derived from deep
metric learning model. To transfer them, we bring the "learning to rank"
technique into deep metric learning formulation. We test our proposed DarkRank
method on various metric learning tasks including pedestrian re-identification,
image retrieval and image clustering. The results are quite encouraging. Our
method can improve over the baseline method by a large margin. Moreover, it is
fully compatible with other existing methods. When combined, the performance
can be further boosted
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
In this paper, we explore the encoding/pooling layer and loss function in the
end-to-end speaker and language recognition system. First, a unified and
interpretable end-to-end system for both speaker and language recognition is
developed. It accepts variable-length input and produces an utterance level
result. In the end-to-end system, the encoding layer plays a role in
aggregating the variable-length input sequence into an utterance level
representation. Besides the basic temporal average pooling, we introduce a
self-attentive pooling layer and a learnable dictionary encoding layer to get
the utterance level representation. In terms of loss function for open-set
speaker verification, to get more discriminative speaker embedding, center loss
and angular softmax loss is introduced in the end-to-end system. Experimental
results on Voxceleb and NIST LRE 07 datasets show that the performance of
end-to-end learning system could be significantly improved by the proposed
encoding layer and loss function.Comment: Accepted for Speaker Odyssey 201
- …