8,909 research outputs found
Supporting Exploratory Queries in Database Centric Web Applications
Users of database-centric Web applications, especially in the e-commerce
domain, often resort to exploratory ``trial-and-error'' queries since the
underlying data space is huge and unfamiliar, and there are several
alternatives for search attributes in this space. For example, scouting for
cheap airfares typically involves posing multiple queries, varying flight
times, dates, and airport locations. Exploratory queries are problematic from
the perspective of both the user and the server. For the database server, it
results in a drastic reduction in effective throughput since much of the
processing is duplicated in each successive query. For the client, it results
in a marked increase in response times, especially when accessing the service
through wireless channels.
In this paper, we investigate the design of automated techniques to minimize
the need for repetitive exploratory queries. Specifically, we present SAUNA, a
server-side query relaxation algorithm that, given the user's initial range
query and a desired cardinality for the answer set, produces a relaxed query
that is expected to contain the required number of answers. The algorithm
incorporates a range-query-specific distance metric that is weighted to produce
relaxed queries of a desired shape (e.g. aspect ratio preserving), and utilizes
multi-dimensional histograms for query size estimation. A detailed performance
evaluation of SAUNA over a variety of multi-dimensional data sets indicates
that its relaxed queries can significantly reduce the costs associated with
exploratory query processing.Comment: Version After chaning the authors, earlier by mistake only first two
Authors were take
Hashing with Mutual Information
Binary vector embeddings enable fast nearest neighbor retrieval in large
databases of high-dimensional objects, and play an important role in many
practical applications, such as image and video retrieval. We study the problem
of learning binary vector embeddings under a supervised setting, also known as
hashing. We propose a novel supervised hashing method based on optimizing an
information-theoretic quantity: mutual information. We show that optimizing
mutual information can reduce ambiguity in the induced neighborhood structure
in the learned Hamming space, which is essential in obtaining high retrieval
performance. To this end, we optimize mutual information in deep neural
networks with minibatch stochastic gradient descent, with a formulation that
maximally and efficiently utilizes available supervision. Experiments on four
image retrieval benchmarks, including ImageNet, confirm the effectiveness of
our method in learning high-quality binary embeddings for nearest neighbor
retrieval
Learning to Hash for Indexing Big Data - A Survey
The explosive growth in big data has attracted much attention in designing
efficient indexing and search methods recently. In many critical applications
such as large-scale search and pattern matching, finding the nearest neighbors
to a query is a fundamental research problem. However, the straightforward
solution using exhaustive comparison is infeasible due to the prohibitive
computational complexity and memory requirement. In response, Approximate
Nearest Neighbor (ANN) search based on hashing techniques has become popular
due to its promising performance in both efficiency and accuracy. Prior
randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore
data-independent hash functions with random projections or permutations.
Although having elegant theoretic guarantees on the search quality in certain
metric spaces, performance of randomized hashing has been shown insufficient in
many real-world applications. As a remedy, new approaches incorporating
data-driven learning methods in development of advanced hash functions have
emerged. Such learning to hash methods exploit information such as data
distributions or class labels when optimizing the hash codes or functions.
Importantly, the learned hash codes are able to preserve the proximity of
neighboring data in the original feature spaces in the hash code spaces. The
goal of this paper is to provide readers with systematic understanding of
insights, pros and cons of the emerging techniques. We provide a comprehensive
survey of the learning to hash framework and representative techniques of
various types, including unsupervised, semi-supervised, and supervised. In
addition, we also summarize recent hashing approaches utilizing the deep
learning models. Finally, we discuss the future direction and trends of
research in this area
A Survey on Learning to Hash
Nearest neighbor search is a problem of finding the data points from the
database such that the distances from them to the query point are the smallest.
Learning to hash is one of the major solutions to this problem and has been
widely studied recently. In this paper, we present a comprehensive survey of
the learning to hash algorithms, categorize them according to the manners of
preserving the similarities into: pairwise similarity preserving, multiwise
similarity preserving, implicit similarity preserving, as well as quantization,
and discuss their relations. We separate quantization from pairwise similarity
preserving as the objective function is very different though quantization, as
we show, can be derived from preserving the pairwise similarities. In addition,
we present the evaluation protocols, and the general performance analysis, and
point out that the quantization algorithms perform superiorly in terms of
search accuracy, search time cost, and space cost. Finally, we introduce a few
emerging topics.Comment: To appear in IEEE Transactions On Pattern Analysis and Machine
Intelligence (TPAMI
Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval
Unsupervised hashing can desirably support scalable content-based image
retrieval (SCBIR) for its appealing advantages of semantic label independence,
memory and search efficiency. However, the learned hash codes are embedded with
limited discriminative semantics due to the intrinsic limitation of image
representation. To address the problem, in this paper, we propose a novel
hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH).
The key idea is to \emph{directly} augment the semantics of discrete image hash
codes by exploring auxiliary contextual modalities. To this end, a unified
hashing framework is formulated to simultaneously preserve visual similarities
of images and perform semantic transfer from contextual modalities. Further, to
guarantee direct semantic transfer and avoid information loss, we explicitly
impose the discrete constraint, bit--uncorrelation constraint and bit-balance
constraint on hash codes. A novel and effective discrete optimization method
based on augmented Lagrangian multiplier is developed to iteratively solve the
optimization problem. The whole learning process has linear computation
complexity and desirable scalability. Experiments on three benchmark datasets
demonstrate the superiority of DSTH compared with several state-of-the-art
approaches
Binary Generative Adversarial Networks for Image Retrieval
The most striking successes in image retrieval using deep hashing have mostly
involved discriminative models, which require labels. In this paper, we use
binary generative adversarial networks (BGAN) to embed images to binary codes
in an unsupervised way. By restricting the input noise variable of generative
adversarial networks (GAN) to be binary and conditioned on the features of each
input image, BGAN can simultaneously learn a binary representation per image,
and generate an image plausibly similar to the original one. In the proposed
framework, we address two main problems: 1) how to directly generate binary
codes without relaxation? 2) how to equip the binary representation with the
ability of accurate image retrieval? We resolve these problems by proposing new
sign-activation strategy and a loss function steering the learning process,
which consists of new models for adversarial loss, a content loss, and a
neighborhood structure loss. Experimental results on standard datasets
(CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN significantly
outperforms existing hashing methods by up to 107\% in terms of~mAP (See Table
tab.res.map.comp) Our anonymous code is available at:
https://github.com/htconquer/BGAN.Comment: arXiv admin note: text overlap with arXiv:1702.00758 by other author
Deep Discrete Supervised Hashing
Hashing has been widely used for large-scale search due to its low storage
cost and fast query speed. By using supervised information, supervised hashing
can significantly outperform unsupervised hashing. Recently, discrete
supervised hashing and deep hashing are two representative progresses in
supervised hashing. On one hand, hashing is essentially a discrete optimization
problem. Hence, utilizing supervised information to directly guide discrete
(binary) coding procedure can avoid sub-optimal solution and improve the
accuracy. On the other hand, deep hashing, which integrates deep feature
learning and hash-code learning into an end-to-end architecture, can enhance
the feedback between feature learning and hash-code learning. The key in
discrete supervised hashing is to adopt supervised information to directly
guide the discrete coding procedure in hashing. The key in deep hashing is to
adopt the supervised information to directly guide the deep feature learning
procedure. However, there have not existed works which can use the supervised
information to directly guide both discrete coding procedure and deep feature
learning procedure in the same framework. In this paper, we propose a novel
deep hashing method, called deep discrete supervised hashing (DDSH), to address
this problem. DDSH is the first deep hashing method which can utilize
supervised information to directly guide both discrete coding procedure and
deep feature learning procedure, and thus enhance the feedback between these
two important procedures. Experiments on three real datasets show that DDSH can
outperform other state-of-the-art baselines, including both discrete hashing
and deep hashing baselines, for image retrieval
Fusion-supervised Deep Cross-modal Hashing
Deep hashing has recently received attention in cross-modal retrieval for its
impressive advantages. However, existing hashing methods for cross-modal
retrieval cannot fully capture the heterogeneous multi-modal correlation and
exploit the semantic information. In this paper, we propose a novel
\emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly,
FDCH learns unified binary codes through a fusion hash network with paired
samples as input, which effectively enhances the modeling of the correlation of
heterogeneous multi-modal data. Then, these high-quality unified hash codes
further supervise the training of the modality-specific hash networks for
encoding out-of-sample queries. Meanwhile, both pair-wise similarity
information and classification information are embedded in the hash networks
under one stream framework, which simultaneously preserves cross-modal
similarity and keeps semantic consistency. Experimental results on two
benchmark datasets demonstrate the state-of-the-art performance of FDCH
Correlation Hashing Network for Efficient Cross-Modal Retrieval
Hashing is widely applied to approximate nearest neighbor search for
large-scale multimodal retrieval with storage and computation efficiency.
Cross-modal hashing improves the quality of hash coding by exploiting semantic
correlations across different modalities. Existing cross-modal hashing methods
first transform data into low-dimensional feature vectors, and then generate
binary codes by another separate quantization step. However, suboptimal hash
codes may be generated since the quantization error is not explicitly minimized
and the feature representation is not jointly optimized with the binary codes.
This paper presents a Correlation Hashing Network (CHN) approach to cross-modal
hashing, which jointly learns good data representation tailored to hash coding
and formally controls the quantization error. The proposed CHN is a hybrid deep
architecture that constitutes a convolutional neural network for learning good
image representations, a multilayer perception for learning good text
representations, two hashing layers for generating compact binary codes, and a
structured max-margin loss that integrates all things together to enable
learning similarity-preserving and high-quality hash codes. Extensive empirical
study shows that CHN yields state of the art cross-modal retrieval performance
on standard benchmarks.Comment: 7 page
Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss
Deep supervised hashing has emerged as an influential solution to large-scale
semantic image retrieval problems in computer vision. In the light of recent
progress, convolutional neural network based hashing methods typically seek
pair-wise or triplet labels to conduct the similarity preserving learning.
However, complex semantic concepts of visual contents are hard to capture by
similar/dissimilar labels, which limits the retrieval performance. Generally,
pair-wise or triplet losses not only suffer from expensive training costs but
also lack in extracting sufficient semantic information. In this regard, we
propose a novel deep supervised hashing model to learn more compact class-level
similarity preserving binary codes. Our deep learning based model is motivated
by deep metric learning that directly takes semantic labels as supervised
information in training and generates corresponding discriminant hashing code.
Specifically, a novel cubic constraint loss function based on Gaussian
distribution is proposed, which preserves semantic variations while penalizes
the overlap part of different classes in the embedding space. To address the
discrete optimization problem introduced by binary codes, a two-step
optimization strategy is proposed to provide efficient training and avoid the
problem of gradient vanishing. Extensive experiments on four large-scale
benchmark databases show that our model can achieve the state-of-the-art
retrieval performance. Moreover, when training samples are limited, our method
surpasses other supervised deep hashing methods with non-negligible margins
- …