1,302 research outputs found
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
We have witnessed rapid evolution of deep neural network architecture design
in the past years. These latest progresses greatly facilitate the developments
in various areas such as computer vision and natural language processing.
However, along with the extraordinary performance, these state-of-the-art
models also bring in expensive computational cost. Directly deploying these
models into applications with real-time requirement is still infeasible.
Recently, Hinton etal. have shown that the dark knowledge within a powerful
teacher model can significantly help the training of a smaller and faster
student network. These knowledge are vastly beneficial to improve the
generalization ability of the student model. Inspired by their work, we
introduce a new type of knowledge -- cross sample similarities for model
compression and acceleration. This knowledge can be naturally derived from deep
metric learning model. To transfer them, we bring the "learning to rank"
technique into deep metric learning formulation. We test our proposed DarkRank
method on various metric learning tasks including pedestrian re-identification,
image retrieval and image clustering. The results are quite encouraging. Our
method can improve over the baseline method by a large margin. Moreover, it is
fully compatible with other existing methods. When combined, the performance
can be further boosted
Temporal Model Adaptation for Person Re-Identification
Person re-identification is an open and challenging problem in computer
vision. Majority of the efforts have been spent either to design the best
feature representation or to learn the optimal matching metric. Most approaches
have neglected the problem of adapting the selected features or the learned
model over time. To address such a problem, we propose a temporal model
adaptation scheme with human in the loop. We first introduce a
similarity-dissimilarity learning method which can be trained in an incremental
fashion by means of a stochastic alternating directions methods of multipliers
optimization procedure. Then, to achieve temporal adaptation with limited human
effort, we exploit a graph-based approach to present the user only the most
informative probe-gallery matches that should be used to update the model.
Results on three datasets have shown that our approach performs on par or even
better than state-of-the-art approaches while reducing the manual pairwise
labeling effort by about 80%
Learning to rank music tracks using triplet loss
Most music streaming services rely on automatic recommendation algorithms to
exploit their large music catalogs. These algorithms aim at retrieving a ranked
list of music tracks based on their similarity with a target music track. In
this work, we propose a method for direct recommendation based on the audio
content without explicitly tagging the music tracks. To that aim, we propose
several strategies to perform triplet mining from ranked lists. We train a
Convolutional Neural Network to learn the similarity via triplet loss. These
different strategies are compared and validated on a large-scale experiment
against an auto-tagging based approach. The results obtained highlight the
efficiency of our system, especially when associated with an Auto-pooling
layer
Learning Correspondence Structures for Person Re-identification
This paper addresses the problem of handling spatial misalignments due to
camera-view changes or human-pose variations in person re-identification. We
first introduce a boosting-based approach to learn a correspondence structure
which indicates the patch-wise matching probabilities between images from a
target camera pair. The learned correspondence structure can not only capture
the spatial correspondence pattern between cameras but also handle the
viewpoint or human-pose variation in individual images. We further introduce a
global constraint-based matching process. It integrates a global matching
constraint over the learned correspondence structure to exclude cross-view
misalignments during the image patch matching process, hence achieving a more
reliable matching score between images. Finally, we also extend our approach by
introducing a multi-structure scheme, which learns a set of local
correspondence structures to capture the spatial correspondence sub-patterns
between a camera pair, so as to handle the spatial misalignments between
individual images in a more precise way. Experimental results on various
datasets demonstrate the effectiveness of our approach.Comment: IEEE Trans. Image Processing, vol. 26, no. 5, pp. 2438-2453, 2017.
The project page for this paper is available at
http://min.sjtu.edu.cn/lwydemo/personReID.htm arXiv admin note: text overlap
with arXiv:1504.0624
- …