29 research outputs found
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch
In this work we introduce a cross modal image retrieval system that allows
both text and sketch as input modalities for the query. A cross-modal deep
network architecture is formulated to jointly model the sketch and text input
modalities as well as the the image output modality, learning a common
embedding between text and images and between sketches and images. In addition,
an attention model is used to selectively focus the attention on the different
objects of the image, allowing for retrieval with multiple objects in the
query. Experiments show that the proposed method performs the best in both
single and multiple object image retrieval in standard datasets.Comment: Accepted at ICPR 201
COLOR FEATURE WITH SPATIAL INFORMATION EXTRACTION METHODS FOR CBIR: A REVIEW
Inn then last two decades the Content Based Image Retrieval (CBIR) considered as one of the topic of interest for theresearchers. It depending one analysis of the image’s visual content which can be done by extracting the color, texture and shapefeatures. Therefore, feature extraction is one of the important steps in CBIR system for representing the image completely. Color featureis the most widely used and more reliable feature among the image visual features. This paper reviews different methods, namely LocalColor Histogram, Color Correlogram, Row sum and Column sum and Colors Coherences Vectors were used to extract colors featurestaking in consideration the spatial information of the image
Compositional Sketch Search
We present an algorithm for searching image collections using free-hand
sketches that describe the appearance and relative positions of multiple
objects. Sketch based image retrieval (SBIR) methods predominantly match
queries containing a single, dominant object invariant to its position within
an image. Our work exploits drawings as a concise and intuitive representation
for specifying entire scene compositions. We train a convolutional neural
network (CNN) to encode masked visual features from sketched objects, pooling
these into a spatial descriptor encoding the spatial relationships and
appearances of objects in the composition. Training the CNN backbone as a
Siamese network under triplet loss yields a metric search embedding for
measuring compositional similarity which may be efficiently leveraged for
visual search by applying product quantization.Comment: ICIP 2021 camera-ready versio
Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval
Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal
retrieval task for searching natural images given free-hand sketches under the
zero-shot scenario. Most existing methods solve this problem by simultaneously
projecting visual features and semantic supervision into a low-dimensional
common space for efficient retrieval. However, such low-dimensional projection
destroys the completeness of semantic knowledge in original semantic space, so
that it is unable to transfer useful knowledge well when learning semantic from
different modalities. Moreover, the domain information and semantic information
are entangled in visual features, which is not conducive for cross-modal
matching since it will hinder the reduction of domain gap between sketch and
image. In this paper, we propose a Progressive Domain-independent Feature
Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of
original semantic knowledge, PDFD decomposes visual features into domain
features and semantic ones, and then the semantic features are projected into
common space as retrieval features for ZS-SBIR. The progressive projection
strategy maintains strong semantic supervision. Besides, to guarantee the
retrieval features to capture clean and complete semantic information, the
cross-reconstruction loss is introduced to encourage that any combinations of
retrieval features and domain features can reconstruct the visual features.
Extensive experiments demonstrate the superiority of our PDFD over
state-of-the-art competitors
Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval
Free-hand sketch-based image retrieval (SBIR) is a specific cross-view
retrieval task, in which queries are abstract and ambiguous sketches while the
retrieval database is formed with natural images. Work in this area mainly
focuses on extracting representative and shared features for sketches and
natural images. However, these can neither cope well with the geometric
distortion between sketches and images nor be feasible for large-scale SBIR due
to the heavy continuous-valued distance computation. In this paper, we speed up
SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch
Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and
incorporated into an end-to-end binary coding framework. Specifically, three
convolutional neural networks are utilized to encode free-hand sketches,
natural images and, especially, the auxiliary sketch-tokens which are adopted
as bridges to mitigate the sketch-image geometric distortion. The learned DSH
codes can effectively capture the cross-view similarities as well as the
intrinsic semantic correlations between different categories. To the best of
our knowledge, DSH is the first hashing work specifically designed for
category-level SBIR with an end-to-end deep architecture. The proposed DSH is
comprehensively evaluated on two large-scale datasets of TU-Berlin Extension
and Sketchy, and the experiments consistently show DSH's superior SBIR
accuracies over several state-of-the-art methods, while achieving significantly
reduced retrieval time and memory footprint.Comment: This paper will appear as a spotlight paper in CVPR201
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201