479 research outputs found
DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds
Learning local descriptors is an important problem in computer vision. While
there are many techniques for learning local patch descriptors for 2D images,
recently efforts have been made for learning local descriptors for 3D points.
The recent progress towards solving this problem in 3D leverages the strong
feature representation capability of image based convolutional neural networks
by utilizing RGB-D or multi-view representations. However, in this paper, we
propose to learn 3D local descriptors by directly processing unstructured 3D
point clouds without needing any intermediate representation. The method
constitutes a deep network for learning permutation invariant representation of
3D points. To learn the local descriptors, we use a multi-margin contrastive
loss which discriminates between similar and dissimilar points on a surface
while also leveraging the extent of dissimilarity among the negative samples at
the time of training. With comprehensive evaluation against strong baselines,
we show that the proposed method outperforms state-of-the-art methods for
matching points in 3D point clouds. Further, we demonstrate the effectiveness
of the proposed method on various applications achieving state-of-the-art
results
DeepHash: Getting Regularization, Depth and Fine-Tuning Right
This work focuses on representing very high-dimensional global image
descriptors using very compact 64-1024 bit binary hashes for instance
retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to
making DeepHash work at extremely low bitrates are three important
considerations -- regularization, depth and fine-tuning -- each requiring
solutions specific to the hashing problem. In-depth evaluation shows that our
scheme consistently outperforms state-of-the-art methods across all data sets
for both Fisher Vectors and Deep Convolutional Neural Network features, by up
to 20 percent over other schemes. The retrieval performance with 256-bit hashes
is close to that of the uncompressed floating point features -- a remarkable
512 times compression
Image similarity using Deep CNN and Curriculum Learning
Image similarity involves fetching similar looking images given a reference
image. Our solution called SimNet, is a deep siamese network which is trained
on pairs of positive and negative images using a novel online pair mining
strategy inspired by Curriculum learning. We also created a multi-scale CNN,
where the final image embedding is a joint representation of top as well as
lower layer embedding's. We go on to show that this multi-scale siamese network
is better at capturing fine grained image similarities than traditional CNN's.Comment: 9 pages, 6 figures, GHCI 17 conferenc
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in
many computer vision tasks. However, this achievement is preceded by extreme
manual annotation in order to perform either training from scratch or
fine-tuning for the target task. In this work, we propose to fine-tune CNN for
image retrieval from a large collection of unordered images in a fully
automated manner. We employ state-of-the-art retrieval and
Structure-from-Motion (SfM) methods to obtain 3D models, which are used to
guide the selection of the training data for CNN fine-tuning. We show that both
hard positive and hard negative examples enhance the final performance in
particular object retrieval with compact codes.Comment: ECCV 201
Fine-tuning CNN Image Retrieval with No Human Annotation
Image descriptors based on activations of Convolutional Neural Networks
(CNNs) have become dominant in image retrieval due to their discriminative
power, compactness of representation, and search efficiency. Training of CNNs,
either from scratch or fine-tuning, requires a large amount of annotated data,
where a high quality of annotation is often crucial. In this work, we propose
to fine-tune CNNs for image retrieval on a large collection of unordered images
in a fully automated manner. Reconstructed 3D models obtained by the
state-of-the-art retrieval and structure-from-motion methods guide the
selection of the training data. We show that both hard-positive and
hard-negative examples, selected by exploiting the geometry and the camera
positions available from the 3D models, enhance the performance of
particular-object retrieval. CNN descriptor whitening discriminatively learned
from the same training data outperforms commonly used PCA whitening. We propose
a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and
average pooling and show that it boosts retrieval performance. Applying the
proposed method to the VGG network achieves state-of-the-art performance on the
standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.Comment: TPAMI 2018. arXiv admin note: substantial text overlap with
arXiv:1604.0242
From handcrafted to deep local features
This paper presents an overview of the evolution of local features from
handcrafted to deep-learning-based methods, followed by a discussion of several
benchmarks and papers evaluating such local features. Our investigations are
motivated by 3D reconstruction problems, where the precise location of the
features is important. As we describe these methods, we highlight and explain
the challenges of feature extraction and potential ways to overcome them. We
first present handcrafted methods, followed by methods based on classical
machine learning and finally we discuss methods based on deep-learning. This
largely chronologically-ordered presentation will help the reader to fully
understand the topic of image and region description in order to make best use
of it in modern computer vision applications. In particular, understanding
handcrafted methods and their motivation can help to understand modern
approaches and how machine learning is used to improve the results. We also
provide references to most of the relevant literature and code.Comment: Preprin
AI Oriented Large-Scale Video Management for Smart City: Technologies, Standards and Beyond
Deep learning has achieved substantial success in a series of tasks in
computer vision. Intelligent video analysis, which can be broadly applied to
video surveillance in various smart city applications, can also be driven by
such powerful deep learning engines. To practically facilitate deep neural
network models in the large-scale video analysis, there are still unprecedented
challenges for the large-scale video data management. Deep feature coding,
instead of video coding, provides a practical solution for handling the
large-scale video surveillance data. To enable interoperability in the context
of deep feature coding, standardization is urgent and important. However, due
to the explosion of deep learning algorithms and the particularity of feature
coding, there are numerous remaining problems in the standardization process.
This paper envisions the future deep feature coding standard for the AI
oriented large-scale video management, and discusses existing techniques,
standards and possible solutions for these open problems.Comment: 8 pages, 8 figures, 5 table
Scalable Change Retrieval Using Deep 3D Neural Codes
We present a novel scalable framework for image change detection (ICD) from
an on-board 3D imagery system. We argue that existing ICD systems are
constrained by the time required to align a given query image with individual
reference image coordinates. We utilize an invariant coordinate system (ICS) to
replace the time-consuming image alignment with an offline pre-processing
procedure. Our key contribution is an extension of the traditional image
comparison-based ICD tasks to setups of the image retrieval (IR) task. We
replace each component of the 3D ICD system, i.e., (1) image modeling, (2)
image alignment, and (3) image differencing, with significantly efficient
variants from the bag-of-words (BoW) IR paradigm. Further, we train a deep 3D
feature extractor in an unsupervised manner using an unsupervised Siamese
network and automatically collected training data. We conducted experiments on
a challenging cross-season ICD task using a publicly available dataset and
thereby validate the efficacy of the proposed approach.Comment: 5 pages, 1 figure, technical repor
Large age-gap face verification by feature injection in deep networks
This paper introduces a new method for face verification across large age
gaps and also a dataset containing variations of age in the wild, the Large
Age-Gap (LAG) dataset, with images ranging from child/young to adult/old. The
proposed method exploits a deep convolutional neural network (DCNN) pre-trained
for the face recognition task on a large dataset and then fine-tuned for the
large age-gap face verification task. Finetuning is performed in a Siamese
architecture using a contrastive loss function. A feature injection layer is
introduced to boost verification accuracy, showing the ability of the DCNN to
learn a similarity metric leveraging external features. Experimental results on
the LAG dataset show that our method is able to outperform the face
verification solutions in the state of the art considered.Comment: Submitte
Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks
We present a new local descriptor for 3D shapes, directly applicable to a
wide range of shape analysis problems such as point correspondences, semantic
segmentation, affordance prediction, and shape-to-scan matching. The descriptor
is produced by a convolutional network that is trained to embed geometrically
and semantically similar points close to one another in descriptor space. The
network processes surface neighborhoods around points on a shape that are
captured at multiple scales by a succession of progressively zoomed out views,
taken from carefully selected camera positions. We leverage two extremely large
sources of data to train our network. First, since our network processes
rendered views in the form of 2D images, we repurpose architectures pre-trained
on massive image datasets. Second, we automatically generate a synthetic dense
point correspondence dataset by non-rigid alignment of corresponding shape
parts in a large collection of segmented 3D models. As a result of these design
choices, our network effectively encodes multi-scale local context and
fine-grained surface detail. Our network can be trained to produce either
category-specific descriptors or more generic descriptors by learning from
multiple shape categories. Once trained, at test time, the network extracts
local descriptors for shapes without requiring any part segmentation as input.
Our method can produce effective local descriptors even for shapes whose
category is unknown or different from the ones used while training. We
demonstrate through several experiments that our learned local descriptors are
more discriminative compared to state of the art alternatives, and are
effective in a variety of shape analysis applications
- …