9,005 research outputs found
A Semi-Supervised Maximum Margin Metric Learning Approach for Small Scale Person Re-identification
In video surveillance, person re-identification is the task of searching
person images in non-overlapping cameras. Though supervised methods for person
re-identification have attained impressive performance, obtaining large scale
cross-view labeled training data is very expensive. However, unlabelled data is
available in abundance. In this paper, we propose a semi-supervised metric
learning approach that can utilize information in unlabelled data with the help
of a few labelled training samples. We also address the small sample size
problem that inherently occurs due to the few labeled training data. Our method
learns a discriminative space where within class samples collapse to singular
points, achieving the least within class variance, and then use a maximum
margin criterion over a high dimensional kernel space to maximally separate the
distinct class samples. A maximum margin criterion with two levels of high
dimensional mappings to kernel space is used to obtain better cross-view
discrimination of the identities. Cross-view affinity learning with reciprocal
nearest neighbor constraints is used to mine new pseudo-classes from the
unlabelled data and update the distance metric iteratively. We attain
state-of-the-art performance on four challenging datasets with a large margin
Cross-Entropy Adversarial View Adaptation for Person Re-identification
Person re-identification (re-ID) is a task of matching pedestrians under
disjoint camera views. To recognise paired snapshots, it has to cope with large
cross-view variations caused by the camera view shift. Supervised deep neural
networks are effective in producing a set of non-linear projections that can
transform cross-view images into a common feature space. However, they
typically impose a symmetric architecture, yielding the network ill-conditioned
on its optimisation. In this paper, we learn view-invariant subspace for person
re-ID, and its corresponding similarity metric using an adversarial view
adaptation approach. The main contribution is to learn coupled asymmetric
mappings regarding view characteristics which are adversarially trained to
address the view discrepancy by optimising the cross-entropy view confusion
objective. To determine the similarity value, the network is empowered with a
similarity discriminator to promote features that are highly discriminant in
distinguishing positive and negative pairs. The other contribution includes an
adaptive weighing on the most difficult samples to address the imbalance of
within/between-identity pairs. Our approach achieves notable improved
performance in comparison to state-of-the-arts on benchmark datasets.Comment: Appearing at IEEE Transactions on Circuits and Systems for Video
Technolog
Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss
This paper proposes multiscale convolutional neural network (CNN)-based deep
metric learning for bioacoustic classification, under low training data
conditions. The proposed CNN is characterized by the utilization of four
different filter sizes at each level to analyze input feature maps. This
multiscale nature helps in describing different bioacoustic events effectively:
smaller filters help in learning the finer details of bioacoustic events,
whereas, larger filters help in analyzing a larger context leading to global
details. A dynamic triplet loss is employed in the proposed CNN architecture to
learn a transformation from the input space to the embedding space, where
classification is performed. The triplet loss helps in learning this
transformation by analyzing three examples, referred to as triplets, at a time
where intra-class distance is minimized while maximizing the inter-class
separation by a dynamically increasing margin. The number of possible triplets
increases cubically with the dataset size, making triplet loss more suitable
than the softmax cross-entropy loss in low training data conditions.
Experiments on three different publicly available datasets show that the
proposed framework performs better than existing bioacoustic classification
frameworks. Experimental results also confirm the superiority of the triplet
loss over the cross-entropy loss in low training data conditionsComment: Under Review at JASA. Primitive version of paper. We are still
working on getting better performances out of the comparative method
Learning Large Euclidean Margin for Sketch-based Image Retrieval
This paper addresses the problem of Sketch-Based Image Retrieval (SBIR), for
which bridge the gap between the data representations of sketch images and
photo images is considered as the key. Previous works mostly focus on learning
a feature space to minimize intra-class distances for both sketches and photos.
In contrast, we propose a novel loss function, named Euclidean Margin Softmax
(EMS), that not only minimizes intra-class distances but also maximizes
inter-class distances simultaneously. It enables us to learn a feature space
with high discriminability, leading to highly accurate retrieval. In addition,
this loss function is applied to a conditional network architecture, which
could incorporate the prior knowledge of whether a sample is a sketch or a
photo. We show that the conditional information can be conveniently
incorporated to the recently proposed Squeeze and Excitation (SE) module, lead
to a conditional SE (CSE) module. Extensive experiments are conducted on two
widely used SBIR benchmark datasets. Our approach, although being very simple,
achieved new state-of-the-art on both datasets, surpassing existing methods by
a large margin.Comment: 13 pages, 6 figure
Iterated Support Vector Machines for Distance Metric Learning
Distance metric learning aims to learn from the given training data a valid
distance metric, with which the similarity between data samples can be more
effectively evaluated for classification. Metric learning is often formulated
as a convex or nonconvex optimization problem, while many existing metric
learning algorithms become inefficient for large scale problems. In this paper,
we formulate metric learning as a kernel classification problem, and solve it
by iterated training of support vector machines (SVM). The new formulation is
easy to implement, efficient in training, and tractable for large-scale
problems. Two novel metric learning models, namely Positive-semidefinite
Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained
Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the
global optimality of their solutions. Experimental results on UCI dataset
classification, handwritten digit recognition, face verification and person
re-identification demonstrate that the proposed metric learning methods achieve
higher classification accuracy than state-of-the-art methods and they are
significantly more efficient in training.Comment: 14 pages, 10 figure
Transfer Metric Learning: Algorithms, Applications and Outlooks
Distance metric learning (DML) aims to find an appropriate way to reveal the
underlying data relationship. It is critical in many machine learning, pattern
recognition and data mining algorithms, and usually require large amount of
label information (such as class labels or pair/triplet constraints) to achieve
satisfactory performance. However, the label information may be insufficient in
real-world applications due to the high-labeling cost, and DML may fail in this
case. Transfer metric learning (TML) is able to mitigate this issue for DML in
the domain of interest (target domain) by leveraging knowledge/information from
other related domains (source domains). Although achieved a certain level of
development, TML has limited success in various aspects such as selective
transfer, theoretical understanding, handling complex data, big data and
extreme cases. In this survey, we present a systematic review of the TML
literature. In particular, we group TML into different categories according to
different settings and metric transfer strategies, such as direct metric
approximation, subspace approximation, distance approximation, and distribution
approximation. A summarization and insightful discussion of the various TML
approaches and their applications will be presented. Finally, we indicate some
challenges and provide possible future directions.Comment: 14 pages, 5 figure
Weakly Supervised Person Re-ID: Differentiable Graphical Learning and A New Benchmark
Person re-identification (Re-ID) benefits greatly from the accurate
annotations of existing datasets (e.g., CUHK03 [1] and Market-1501 [2]), which
are quite expensive because each image in these datasets has to be assigned
with a proper label. In this work, we ease the annotation of Re-ID by replacing
the accurate annotation with inaccurate annotation, i.e., we group the images
into bags in terms of time and assign a bag-level label for each bag. This
greatly reduces the annotation effort and leads to the creation of a
large-scale Re-ID benchmark called SYSU-30. The new benchmark contains
individuals, which is about times larger than CUHK03 ( individuals)
and Market-1501 ( individuals), and times larger than ImageNet (
categories). It sums up to 29,606,918 images. Learning a Re-ID model with
bag-level annotation is called the weakly supervised Re-ID problem. To solve
this problem, we introduce a differentiable graphical model to capture the
dependencies from all images in a bag and generate a reliable pseudo label for
each person image. The pseudo label is further used to supervise the learning
of the Re-ID model. When compared with the fully supervised Re-ID models, our
method achieves state-of-the-art performance on SYSU-30 and other datasets.
The code, dataset, and pretrained model will be available at
\url{https://github.com/wanggrun/SYSU-30k}.Comment: Accepted by TNNLS 202
Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning
Cross-domain visual data matching is one of the fundamental problems in many
real-world vision tasks, e.g., matching persons across ID photos and
surveillance videos. Conventional approaches to this problem usually involves
two steps: i) projecting samples from different domains into a common space,
and ii) computing (dis-)similarity in this space based on a certain distance.
In this paper, we present a novel pairwise similarity measure that advances
existing models by i) expanding traditional linear projections into affine
transformations and ii) fusing affine Mahalanobis distance and Cosine
similarity by a data-driven combination. Moreover, we unify our similarity
measure with feature representation learning via deep convolutional neural
networks. Specifically, we incorporate the similarity measure matrix into the
deep architecture, enabling an end-to-end way of model optimization. We
extensively evaluate our generalized similarity model in several challenging
cross-domain matching tasks: person re-identification under different views and
face verification over different modalities (i.e., faces from still images and
videos, older and younger faces, and sketch and photo portraits). The
experimental results demonstrate superior performance of our model over other
state-of-the-art methods.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI), 201
Unsupervised Person Re-identification: Clustering and Fine-tuning
The superiority of deeply learned pedestrian representations has been
reported in very recent literature of person re-identification (re-ID). In this
paper, we consider the more pragmatic issue of learning a deep feature with no
or only a few labels. We propose a progressive unsupervised learning (PUL)
method to transfer pretrained deep representations to unseen domains. Our
method is easy to implement and can be viewed as an effective baseline for
unsupervised re-ID feature learning. Specifically, PUL iterates between 1)
pedestrian clustering and 2) fine-tuning of the convolutional neural network
(CNN) to improve the original model trained on the irrelevant labeled dataset.
Since the clustering results can be very noisy, we add a selection operation
between the clustering and fine-tuning. At the beginning when the model is
weak, CNN is fine-tuned on a small amount of reliable examples which locate
near to cluster centroids in the feature space. As the model becomes stronger
in subsequent iterations, more images are being adaptively selected as CNN
training samples. Progressively, pedestrian clustering and the CNN model are
improved simultaneously until algorithm convergence. This process is naturally
formulated as self-paced learning. We then point out promising directions that
may lead to further improvement. Extensive experiments on three large-scale
re-ID datasets demonstrate that PUL outputs discriminative features that
improve the re-ID accuracy.Comment: Add more results, parameter analysis and comparison
Learning to Cluster Faces on an Affinity Graph
Face recognition sees remarkable progress in recent years, and its
performance has reached a very high level. Taking it to a next level requires
substantially larger data, which would involve prohibitive annotation cost.
Hence, exploiting unlabeled data becomes an appealing alternative. Recent works
have shown that clustering unlabeled faces is a promising approach, often
leading to notable performance gains. Yet, how to effectively cluster,
especially on a large-scale (i.e. million-level or above) dataset, remains an
open question. A key challenge lies in the complex variations of cluster
patterns, which make it difficult for conventional clustering methods to meet
the needed accuracy. This work explores a novel approach, namely, learning to
cluster instead of relying on hand-crafted criteria. Specifically, we propose a
framework based on graph convolutional network, which combines a detection and
a segmentation module to pinpoint face clusters. Experiments show that our
method yields significantly more accurate face clusters, which, as a result,
also lead to further performance gain in face recognition.Comment: 8 pages, 8 figures, CVPR 201
- …