1,666 research outputs found
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
De-Hashing: Server-Side Context-Aware Feature Reconstruction for Mobile Visual Search
Due to the prevalence of mobile devices, mobile search becomes a more
convenient way than desktop search. Different from the traditional desktop
search, mobile visual search needs more consideration for the limited resources
on mobile devices (e.g., bandwidth, computing power, and memory consumption).
The state-of-the-art approaches show that bag-of-words (BoW) model is robust
for image and video retrieval; however, the large vocabulary tree might not be
able to be loaded on the mobile device. We observe that recent works mainly
focus on designing compact feature representations on mobile devices for
bandwidth-limited network (e.g., 3G) and directly adopt feature matching on
remote servers (cloud). However, the compact (binary) representation might fail
to retrieve target objects (images, videos). Based on the hashed binary codes,
we propose a de-hashing process that reconstructs BoW by leveraging the
computing power of remote servers. To mitigate the information loss from binary
codes, we further utilize contextual information (e.g., GPS) to reconstruct a
context-aware BoW for better retrieval results. Experiment results show that
the proposed method can achieve competitive retrieval accuracy as BoW while
only transmitting few bits from mobile devices.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Effective Image Retrieval via Multilinear Multi-index Fusion
Multi-index fusion has demonstrated impressive performances in retrieval task
by integrating different visual representations in a unified framework.
However, previous works mainly consider propagating similarities via neighbor
structure, ignoring the high order information among different visual
representations. In this paper, we propose a new multi-index fusion scheme for
image retrieval. By formulating this procedure as a multilinear based
optimization problem, the complementary information hidden in different indexes
can be explored more thoroughly. Specially, we first build our multiple indexes
from various visual representations. Then a so-called index-specific functional
matrix, which aims to propagate similarities, is introduced for updating the
original index. The functional matrices are then optimized in a unified tensor
space to achieve a refinement, such that the relevant images can be pushed more
closer. The optimization problem can be efficiently solved by the augmented
Lagrangian method with theoretical convergence guarantee. Unlike the
traditional multi-index fusion scheme, our approach embeds the multi-index
subspace structure into the new indexes with sparse constraint, thus it has
little additional memory consumption in online query stage. Experimental
evaluation on three benchmark datasets reveals that the proposed approach
achieves the state-of-the-art performance, i.e., N-score 3.94 on UKBench, mAP
94.1\% on Holiday and 62.39\% on Market-1501.Comment: 12 page
Seeing the Big Picture: Deep Embedding with Contextual Evidences
In the Bag-of-Words (BoW) model based image retrieval task, the precision of
visual matching plays a critical role in improving retrieval performance.
Conventionally, local cues of a keypoint are employed. However, such strategy
does not consider the contextual evidences of a keypoint, a problem which would
lead to the prevalence of false matches. To address this problem, this paper
defines "true match" as a pair of keypoints which are similar on three levels,
i.e., local, regional, and global. Then, a principled probabilistic framework
is established, which is capable of implicitly integrating discriminative cues
from all these feature levels.
Specifically, the Convolutional Neural Network (CNN) is employed to extract
features from regional and global patches, leading to the so-called "Deep
Embedding" framework. CNN has been shown to produce excellent performance on a
dozen computer vision tasks such as image classification and detection, but few
works have been done on BoW based image retrieval. In this paper, firstly we
show that proper pre-processing techniques are necessary for effective usage of
CNN feature. Then, in the attempt to fit it into our model, a novel indexing
structure called "Deep Indexing" is introduced, which dramatically reduces
memory usage.
Extensive experiments on three benchmark datasets demonstrate that, the
proposed Deep Embedding method greatly promotes the retrieval accuracy when CNN
feature is integrated. We show that our method is efficient in terms of both
memory and time cost, and compares favorably with the state-of-the-art methods.Comment: 10 pages, 13 figures, 7 tables, submitted to ACM Multimedia 201
Sketch-based Manga Retrieval using Manga109 Dataset
Manga (Japanese comics) are popular worldwide. However, current e-manga
archives offer very limited search support, including keyword-based search by
title or author, or tag-based categorization. To make the manga search
experience more intuitive, efficient, and enjoyable, we propose a content-based
manga retrieval system. First, we propose a manga-specific image-describing
framework. It consists of efficient margin labeling, edge orientation histogram
feature description, and approximate nearest-neighbor search using product
quantization. Second, we propose a sketch-based interface as a natural way to
interact with manga content. The interface provides sketch-based querying,
relevance feedback, and query retouch. For evaluation, we built a novel dataset
of manga images, Manga109, which consists of 109 comic books of 21,142 pages
drawn by professional manga artists. To the best of our knowledge, Manga109 is
currently the biggest dataset of manga images available for research. We
conducted a comparative study, a localization evaluation, and a large-scale
qualitative study. From the experiments, we verified that: (1) the retrieval
accuracy of the proposed method is higher than those of previous methods; (2)
the proposed method can localize an object instance with reasonable runtime and
accuracy; and (3) sketch querying is useful for manga search.Comment: 13 page
Scalable Image Retrieval by Sparse Product Quantization
Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional
feature indexing and retrieval is the crux of large-scale image retrieval. A
recent promising technique is Product Quantization, which attempts to index
high-dimensional image features by decomposing the feature space into a
Cartesian product of low dimensional subspaces and quantizing each of them
separately. Despite the promising results reported, their quantization approach
follows the typical hard assignment of traditional quantization methods, which
may result in large quantization errors and thus inferior search performance.
Unlike the existing approaches, in this paper, we propose a novel approach
called Sparse Product Quantization (SPQ) to encoding the high-dimensional
feature vectors into sparse representation. We optimize the sparse
representations of the feature vectors by minimizing their quantization errors,
making the resulting representation is essentially close to the original data
in practice. Experiments show that the proposed SPQ technique is not only able
to compress data, but also an effective encoding technique. We obtain
state-of-the-art results for ANN search on four public image datasets and the
promising results of content-based image retrieval further validate the
efficacy of our proposed method.Comment: 12 page
CNN-VWII: An Efficient Approach for Large-Scale Video Retrieval by Image Queries
This paper aims to solve the problem of large-scale video retrieval by a
query image. Firstly, we define the problem of top- image to video query.
Then, we combine the merits of convolutional neural networks(CNN for short) and
Bag of Visual Word(BoVW for short) module to design a model for video frames
information extraction and representation. In order to meet the requirements of
large-scale video retrieval, we proposed a visual weighted inverted index(VWII
for short) and related algorithm to improve the efficiency and accuracy of
retrieval process. Comprehensive experiments show that our proposed technique
achieves substantial improvements (up to an order of magnitude speed up) over
the state-of-the-art techniques with similar accuracy.Comment: submitted to Pattern Recognition Letter
Pairwise Rotation Hashing for High-dimensional Features
Binary Hashing is widely used for effective approximate nearest neighbors
search. Even though various binary hashing methods have been proposed, very few
methods are feasible for extremely high-dimensional features often used in
visual tasks today. We propose a novel highly sparse linear hashing method
based on pairwise rotations. The encoding cost of the proposed algorithm is
for n-dimensional features, whereas that of the existing
state-of-the-art method is typically . The proposed method is
also remarkably faster in the learning phase. Along with the efficiency, the
retrieval accuracy is comparable to or slightly outperforming the
state-of-the-art. Pairwise rotations used in our method are formulated from an
analytical study of the trade-off relationship between quantization error and
entropy of binary codes. Although these hashing criteria are widely used in
previous researches, its analytical behavior is rarely studied. All building
blocks of our algorithm are based on the analytical solution, and it thus
provides a fairly simple and efficient procedure.Comment: 16 pages, 8 figures, wrote at Mar 201
Understanding the Gist of Images - Ranking of Concepts for Multimedia Indexing
Nowadays, where multimedia data is continuously generated, stored, and
distributed, multimedia indexing, with its purpose of group- ing similar data,
becomes more important than ever. Understanding the gist (=message) of
multimedia instances is framed in related work as a ranking of concepts from a
knowledge base, i.e., Wikipedia. We cast the task of multimedia indexing as a
gist understanding problem. Our pipeline benefits from external knowledge and
two subsequent learning- to-rank (l2r) settings. The first l2r produces a
ranking of concepts rep- resenting the respective multimedia instance. The
second l2r produces a mapping between the concept representation of an instance
and the targeted class topic(s) for the multimedia indexing task. The
evaluation on an established big size corpus (MIRFlickr25k, with 25,000
images), shows that multimedia indexing benefits from understanding the gist.
Finally, with a MAP of 61.42, it can be shown that the multimedia in- dexing
task benefits from understanding the gist. Thus, the presented end-to-end
setting outperforms DBM and competes with Hashing-based methods
Matchable Image Retrieval by Learning from Surface Reconstruction
Convolutional Neural Networks (CNNs) have achieved superior performance on
object image retrieval, while Bag-of-Words (BoW) models with handcrafted local
features still dominate the retrieval of overlapping images in 3D
reconstruction. In this paper, we narrow down this gap by presenting an
efficient CNN-based method to retrieve images with overlaps, which we refer to
as the matchable image retrieval problem. Different from previous methods that
generates training data based on sparse reconstruction, we create a large-scale
image database with rich 3D geometrics and exploit information from surface
reconstruction to obtain fine-grained training data. We propose a batched
triplet-based loss function combined with mesh re-projection to effectively
learn the CNN representation. The proposed method significantly accelerates the
image retrieval process in 3D reconstruction and outperforms the
state-of-the-art CNN-based and BoW methods for matchable image retrieval. The
code and data are available at https://github.com/hlzz/mirror.Comment: accepted by ACCV 201
- …