2,894 research outputs found
A Survey on Learning to Hash
Nearest neighbor search is a problem of finding the data points from the
database such that the distances from them to the query point are the smallest.
Learning to hash is one of the major solutions to this problem and has been
widely studied recently. In this paper, we present a comprehensive survey of
the learning to hash algorithms, categorize them according to the manners of
preserving the similarities into: pairwise similarity preserving, multiwise
similarity preserving, implicit similarity preserving, as well as quantization,
and discuss their relations. We separate quantization from pairwise similarity
preserving as the objective function is very different though quantization, as
we show, can be derived from preserving the pairwise similarities. In addition,
we present the evaluation protocols, and the general performance analysis, and
point out that the quantization algorithms perform superiorly in terms of
search accuracy, search time cost, and space cost. Finally, we introduce a few
emerging topics.Comment: To appear in IEEE Transactions On Pattern Analysis and Machine
Intelligence (TPAMI
Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval
Unsupervised hashing can desirably support scalable content-based image
retrieval (SCBIR) for its appealing advantages of semantic label independence,
memory and search efficiency. However, the learned hash codes are embedded with
limited discriminative semantics due to the intrinsic limitation of image
representation. To address the problem, in this paper, we propose a novel
hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH).
The key idea is to \emph{directly} augment the semantics of discrete image hash
codes by exploring auxiliary contextual modalities. To this end, a unified
hashing framework is formulated to simultaneously preserve visual similarities
of images and perform semantic transfer from contextual modalities. Further, to
guarantee direct semantic transfer and avoid information loss, we explicitly
impose the discrete constraint, bit--uncorrelation constraint and bit-balance
constraint on hash codes. A novel and effective discrete optimization method
based on augmented Lagrangian multiplier is developed to iteratively solve the
optimization problem. The whole learning process has linear computation
complexity and desirable scalability. Experiments on three benchmark datasets
demonstrate the superiority of DSTH compared with several state-of-the-art
approaches
Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval
Supervised cross-modal hashing has gained increasing research interest on
large-scale retrieval task owning to its satisfactory performance and
efficiency. However, it still has some challenging issues to be further
studied: 1) most of them fail to well preserve the semantic correlations in
hash codes because of the large heterogenous gap; 2) most of them relax the
discrete constraint on hash codes, leading to large quantization error and
consequent low performance; 3) most of them suffer from relatively high memory
cost and computational complexity during training procedure, which makes them
unscalable. In this paper, to address above issues, we propose a supervised
cross-modal hashing method based on matrix factorization dubbed Efficient
Discrete Supervised Hashing (EDSH). Specifically, collective matrix
factorization on heterogenous features and semantic embedding with class labels
are seamlessly integrated to learn hash codes. Therefore, the feature based
similarities and semantic correlations can be both preserved in hash codes,
which makes the learned hash codes more discriminative. Then an efficient
discrete optimal algorithm is proposed to handle the scalable issue. Instead of
learning hash codes bit-by-bit, hash codes matrix can be obtained directly
which is more efficient. Extensive experimental results on three public
real-world datasets demonstrate that EDSH produces a superior performance in
both accuracy and scalability over some existing cross-modal hashing methods
Efficient Multimedia Similarity Measurement Using Similar Elements
Online social networking techniques and large-scale multimedia systems are
developing rapidly, which not only has brought great convenience to our daily
life, but generated, collected, and stored large-scale multimedia data. This
trend has put forward higher requirements and greater challenges on massive
multimedia data retrieval. In this paper, we investigate the problem of image
similarity measurement which is used to lots of applications. At first we
propose the definition of similarity measurement of images and the related
notions. Based on it we present a novel basic method of similarity measurement
named SMIN. To improve the performance of calculation, we propose a novel
indexing structure called SMI Temp Index (SMII for short). Besides, we
establish an index of potential similar visual words off-line to solve to
problem that the index cannot be reused. Experimental evaluations on two real
image datasets demonstrate that our solution outperforms state-of-the-art
method.Comment: 17 pages. arXiv admin note: text overlap with arXiv:1808.0961
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning
Cross-domain visual data matching is one of the fundamental problems in many
real-world vision tasks, e.g., matching persons across ID photos and
surveillance videos. Conventional approaches to this problem usually involves
two steps: i) projecting samples from different domains into a common space,
and ii) computing (dis-)similarity in this space based on a certain distance.
In this paper, we present a novel pairwise similarity measure that advances
existing models by i) expanding traditional linear projections into affine
transformations and ii) fusing affine Mahalanobis distance and Cosine
similarity by a data-driven combination. Moreover, we unify our similarity
measure with feature representation learning via deep convolutional neural
networks. Specifically, we incorporate the similarity measure matrix into the
deep architecture, enabling an end-to-end way of model optimization. We
extensively evaluate our generalized similarity model in several challenging
cross-domain matching tasks: person re-identification under different views and
face verification over different modalities (i.e., faces from still images and
videos, older and younger faces, and sketch and photo portraits). The
experimental results demonstrate superior performance of our model over other
state-of-the-art methods.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI), 201
An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval
Due to the rapid development of mobile Internet techniques, cloud computation
and popularity of online social networking and location-based services, massive
amount of multimedia data with geographical information is generated and
uploaded to the Internet. In this paper, we propose a novel type of cross-modal
multimedia retrieval called geo-multimedia cross-modal retrieval which aims to
search out a set of geo-multimedia objects based on geographical distance
proximity and semantic similarity between different modalities. Previous
studies for cross-modal retrieval and spatial keyword search cannot address
this problem effectively because they do not consider multimedia data with
geo-tags and do not focus on this type of query. In order to address this
problem efficiently, we present the definition of NN geo-multimedia
cross-modal query at the first time and introduce relevant conceptions such as
cross-modal semantic representation space. To bridge the semantic gap between
different modalities, we propose a method named cross-modal semantic matching
which contains two important component, i.e., CorrProj and LogsTran, which aims
to construct a common semantic representation space for cross-modal semantic
similarity measurement. Besides, we designed a framework based on deep learning
techniques to implement common semantic representation space construction. In
addition, a novel hybrid indexing structure named GMR-Tree combining
geo-multimedia data and R-Tree is presented and a efficient NN search
algorithm called GMCMS is designed. Comprehensive experimental evaluation on
real and synthetic dataset clearly demonstrates that our solution outperforms
the-state-of-the-art methods.Comment: 27 page
Crossing Generative Adversarial Networks for Cross-View Person Re-identification
Person re-identification (\textit{re-id}) refers to matching pedestrians
across disjoint yet non-overlapping camera views. The most effective way to
match these pedestrians undertaking significant visual variations is to seek
reliably invariant features that can describe the person of interest
faithfully. Most of existing methods are presented in a supervised manner to
produce discriminative features by relying on labeled paired images in
correspondence. However, annotating pair-wise images is prohibitively expensive
in labors, and thus not practical in large-scale networked cameras. Moreover,
seeking comparable representations across camera views demands a flexible model
to address the complex distributions of images. In this work, we study the
co-occurrence statistic patterns between pairs of images, and propose to
crossing Generative Adversarial Network (Cross-GAN) for learning a joint
distribution for cross-image representations in a unsupervised manner. Given a
pair of person images, the proposed model consists of the variational
auto-encoder to encode the pair into respective latent variables, a proposed
cross-view alignment to reduce the view disparity, and an adversarial layer to
seek the joint distribution of latent representations. The learned latent
representations are well-aligned to reflect the co-occurrence patterns of
paired images. We empirically evaluate the proposed model against challenging
datasets, and our results show the importance of joint invariant features in
improving matching rates of person re-id with comparison to semi/unsupervised
state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by
other author
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Efficient Region of Visual Interests Search for Geo-multimedia Data
With the proliferation of online social networking services and mobile smart
devices equipped with mobile communications module and position sensor module,
massive amount of multimedia data has been collected, stored and shared. This
trend has put forward higher request on massive multimedia data retrieval. In
this paper, we investigate a novel spatial query named region of visual
interests query (RoVIQ), which aims to search users containing geographical
information and visual words. Three baseline methods are presented to introduce
how to exploit existing techniques to address this problem. Then we propose the
definition of this query and related notions at the first time. To improve the
performance of query, we propose a novel spatial indexing structure called
quadtree based inverted visual index which is a combination of quadtree,
inverted index and visual words. Based on it, we design a efficient search
algorithm named region of visual interests search to support RoVIQ.
Experimental evaluations on real geo-image datasets demonstrate that our
solution outperforms state-of-the-art method.Comment: 22 page
- …