236 research outputs found
Conditional Similarity Networks
What makes images similar? To measure the similarity between images, they are
typically embedded in a feature-vector space, in which their distance preserve
the relative dissimilarity. However, when learning such similarity embeddings
the simplifying assumption is commonly made that images are only compared to
one unique measure of similarity. A main reason for this is that contradicting
notions of similarities cannot be captured in a single space. To address this
shortcoming, we propose Conditional Similarity Networks (CSNs) that learn
embeddings differentiated into semantically distinct subspaces that capture the
different notions of similarities. CSNs jointly learn a disentangled embedding
where features for different similarities are encoded in separate dimensions as
well as masks that select and reweight relevant dimensions to induce a subspace
that encodes a specific similarity notion. We show that our approach learns
interpretable image representations with visually relevant semantic subspaces.
Further, when evaluating on triplet questions from multiple similarity notions
our model even outperforms the accuracy obtained by training individual
specialized networks for each notion separately.Comment: CVPR 201
Detecting Oriented Text in Natural Images by Linking Segments
Most state-of-the-art text detection methods are specific to horizontal Latin
text and are not fast enough for real-time applications. We introduce Segment
Linking (SegLink), an oriented text detection method. The main idea is to
decompose text into two locally detectable elements, namely segments and links.
A segment is an oriented box covering a part of a word or text line; A link
connects two adjacent segments, indicating that they belong to the same word or
text line. Both elements are detected densely at multiple scales by an
end-to-end trained, fully-convolutional neural network. Final detections are
produced by combining segments connected by links. Compared with previous
methods, SegLink improves along the dimensions of accuracy, speed, and ease of
training. It achieves an f-measure of 75.0% on the standard ICDAR 2015
Incidental (Challenge 4) benchmark, outperforming the previous best by a large
margin. It runs at over 20 FPS on 512x512 images. Moreover, without
modification, SegLink is able to detect long lines of non-Latin text, such as
Chinese.Comment: To Appear in CVPR 201
Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop
Existing fine-grained visual categorization methods often suffer from three
challenges: lack of training data, large number of fine-grained categories, and
high intraclass vs. low inter-class variance. In this work we propose a generic
iterative framework for fine-grained categorization and dataset bootstrapping
that handles these three challenges. Using deep metric learning with humans in
the loop, we learn a low dimensional feature embedding with anchor points on
manifolds for each category. These anchor points capture intra-class variances
and remain discriminative between classes. In each round, images with high
confidence scores from our model are sent to humans for labeling. By comparing
with exemplar images, labelers mark each candidate image as either a "true
positive" or a "false positive". True positives are added into our current
dataset and false positives are regarded as "hard negatives" for our metric
learning model. Then the model is retrained with an expanded dataset and hard
negatives for the next round. To demonstrate the effectiveness of the proposed
framework, we bootstrap a fine-grained flower dataset with 620 categories from
Instagram images. The proposed deep metric learning scheme is evaluated on both
our dataset and the CUB-200-2001 Birds dataset. Experimental evaluations show
significant performance gain using dataset bootstrapping and demonstrate
state-of-the-art results achieved by the proposed deep metric learning methods.Comment: 10 pages, 9 figures, CVPR 201
- …