3,377 research outputs found
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
Invariant properties of a locally salient dither pattern with a spatial-chromatic histogram
Compacted Dither Pattern Code (CDPC) is a recently found feature which is
successful in irregular shapes based visual depiction. Locally salient dither
pattern feature is an attempt to expand the capability of CDPC for both regular
and irregular shape based visual depiction. This paper presents an analysis of
rotational and scale invariance property of locally salient dither pattern
feature with a two dimensional spatialchromatic histogram, which expands the
applicability of the visual feature. Experiments were conducted to exhibit
rotational and scale invariance of the feature. These experiments were
conducted by combining linear Support Vector Machine (SVM) classifier to the
new feature. The experimental results revealed that the locally salient dither
pattern feature with the spatialchromatic histogram is rotationally and scale
invariant
Deep Learning Representation using Autoencoder for 3D Shape Retrieval
We study the problem of how to build a deep learning representation for 3D
shape. Deep learning has shown to be very effective in variety of visual
applications, such as image classification and object detection. However, it
has not been successfully applied to 3D shape recognition. This is because 3D
shape has complex structure in 3D space and there are limited number of 3D
shapes for feature learning. To address these problems, we project 3D shapes
into 2D space and use autoencoder for feature learning on the 2D images. High
accuracy 3D shape retrieval performance is obtained by aggregating the features
learned on 2D images. In addition, we show the proposed deep learning feature
is complementary to conventional local image descriptors. By combing the global
deep learning representation and the local descriptor representation, our
method can obtain the state-of-the-art performance on 3D shape retrieval
benchmarks.Comment: 6 pages, 7 figures, 2014ICSPA
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
Content-Based Image Retrieval Based on Late Fusion of Binary and Local Descriptors
One of the challenges in Content-Based Image Retrieval (CBIR) is to reduce
the semantic gaps between low-level features and high-level semantic concepts.
In CBIR, the images are represented in the feature space and the performance of
CBIR depends on the type of selected feature representation. Late fusion also
known as visual words integration is applied to enhance the performance of
image retrieval. The recent advances in image retrieval diverted the focus of
research towards the use of binary descriptors as they are reported
computationally efficient. In this paper, we aim to investigate the late fusion
of Fast Retina Keypoint (FREAK) and Scale Invariant Feature Transform (SIFT).
The late fusion of binary and local descriptor is selected because among binary
descriptors, FREAK has shown good results in classification-based problems
while SIFT is robust to translation, scaling, rotation and small distortions.
The late fusion of FREAK and SIFT integrates the performance of both feature
descriptors for an effective image retrieval. Experimental results and
comparisons show that the proposed late fusion enhances the performances of
image retrieval
Diffusion framework for geometric and photometric data fusion in non-rigid shape analysis
In this paper, we explore the use of the diffusion geometry framework for the
fusion of geometric and photometric information in local and global shape
descriptors. Our construction is based on the definition of a diffusion process
on the shape manifold embedded into a high-dimensional space where the
embedding coordinates represent the photometric information. Experimental
results show that such data fusion is useful in coping with different
challenges of shape analysis where pure geometric and pure photometric methods
fail
Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks
We present a new local descriptor for 3D shapes, directly applicable to a
wide range of shape analysis problems such as point correspondences, semantic
segmentation, affordance prediction, and shape-to-scan matching. The descriptor
is produced by a convolutional network that is trained to embed geometrically
and semantically similar points close to one another in descriptor space. The
network processes surface neighborhoods around points on a shape that are
captured at multiple scales by a succession of progressively zoomed out views,
taken from carefully selected camera positions. We leverage two extremely large
sources of data to train our network. First, since our network processes
rendered views in the form of 2D images, we repurpose architectures pre-trained
on massive image datasets. Second, we automatically generate a synthetic dense
point correspondence dataset by non-rigid alignment of corresponding shape
parts in a large collection of segmented 3D models. As a result of these design
choices, our network effectively encodes multi-scale local context and
fine-grained surface detail. Our network can be trained to produce either
category-specific descriptors or more generic descriptors by learning from
multiple shape categories. Once trained, at test time, the network extracts
local descriptors for shapes without requiring any part segmentation as input.
Our method can produce effective local descriptors even for shapes whose
category is unknown or different from the ones used while training. We
demonstrate through several experiments that our learned local descriptors are
more discriminative compared to state of the art alternatives, and are
effective in a variety of shape analysis applications
GIFT: A Real-time and Scalable 3D Shape Search Engine
Projective analysis is an important solution for 3D shape retrieval, since
human visual perceptions of 3D shapes rely on various 2D observations from
different view points. Although multiple informative and discriminative views
are utilized, most projection-based retrieval systems suffer from heavy
computational cost, thus cannot satisfy the basic requirement of scalability
for search engines. In this paper, we present a real-time 3D shape search
engine based on the projective images of 3D shapes. The real-time property of
our search engine results from the following aspects: (1) efficient projection
and view feature extraction using GPU acceleration; (2) the first inverted
file, referred as F-IF, is utilized to speed up the procedure of multi-view
matching; (3) the second inverted file (S-IF), which captures a local
distribution of 3D shapes in the feature manifold, is adopted for efficient
context-based re-ranking. As a result, for each query the retrieval task can be
finished within one second despite the necessary cost of IO overhead. We name
the proposed 3D shape search engine, which combines GPU acceleration and
Inverted File Twice, as GIFT. Besides its high efficiency, GIFT also
outperforms the state-of-the-art methods significantly in retrieval accuracy on
various shape benchmarks and competitions.Comment: accepted by CVPR16, achieved the first place in Shrec2016
competition: Large-Scale 3D Shape Retrieval under the perturbed cas
Efficient Multimedia Similarity Measurement Using Similar Elements
Online social networking techniques and large-scale multimedia systems are
developing rapidly, which not only has brought great convenience to our daily
life, but generated, collected, and stored large-scale multimedia data. This
trend has put forward higher requirements and greater challenges on massive
multimedia data retrieval. In this paper, we investigate the problem of image
similarity measurement which is used to lots of applications. At first we
propose the definition of similarity measurement of images and the related
notions. Based on it we present a novel basic method of similarity measurement
named SMIN. To improve the performance of calculation, we propose a novel
indexing structure called SMI Temp Index (SMII for short). Besides, we
establish an index of potential similar visual words off-line to solve to
problem that the index cannot be reused. Experimental evaluations on two real
image datasets demonstrate that our solution outperforms state-of-the-art
method.Comment: 17 pages. arXiv admin note: text overlap with arXiv:1808.0961
Discriminative Map Retrieval Using View-Dependent Map Descriptor
Map retrieval, the problem of similarity search over a large collection of 2D
pointset maps previously built by mobile robots, is crucial for autonomous
navigation in indoor and outdoor environments. Bag-of-words (BoW) methods
constitute a popular approach to map retrieval; however, these methods have
extremely limited descriptive ability because they ignore the spatial layout
information of the local features. The main contribution of this paper is an
extension of the bag-of-words map retrieval method to enable the use of spatial
information from local features. Our strategy is to explicitly model a unique
viewpoint of an input local map; the pose of the local feature is defined with
respect to this unique viewpoint, and can be viewed as an additional invariant
feature for discriminative map retrieval. Specifically, we wish to determine a
unique viewpoint that is invariant to moving objects, clutter, occlusions, and
actual viewpoints. Hence, we perform scene parsing to analyze the scene
structure, and consider the "center" of the scene structure to be the unique
viewpoint. Our scene parsing is based on a Manhattan world grammar that imposes
a quasi-Manhattan world constraint to enable the robust detection of a scene
structure that is invariant to clutter and moving objects. Experimental results
using the publicly available radish dataset validate the efficacy of the
proposed approach.Comment: Technical Report, 8 pages, 9 figure
- …