36 research outputs found
Multimodal diff-hash
Many applications require comparing multimodal data with different structure
and dimensionality that cannot be compared directly. Recently, there has been
increasing interest in methods for learning and efficiently representing such
multimodal similarity. In this paper, we present a simple algorithm for
multimodal similarity-preserving hashing, trying to map multimodal data into
the Hamming space while preserving the intra- and inter-modal similarities. We
show that our method significantly outperforms the state-of-the-art method in
the field
Random Forests Can Hash
Hash codes are a very efficient data representation needed to be able to cope
with the ever growing amounts of data. We introduce a random forest semantic
hashing scheme with information-theoretic code aggregation, showing for the
first time how random forest, a technique that together with deep learning have
shown spectacular results in classification, can also be extended to
large-scale retrieval. Traditional random forest fails to enforce the
consistency of hashes generated from each tree for the same class data, i.e.,
to preserve the underlying similarity, and it also lacks a principled way for
code aggregation across trees. We start with a simple hashing scheme, where
independently trained random trees in a forest are acting as hashing functions.
We the propose a subspace model as the splitting function, and show that it
enforces the hash consistency in a tree for data from the same class. We also
introduce an information-theoretic approach for aggregating codes of individual
trees into a single hash code, producing a near-optimal unique hash for each
class. Experiments on large-scale public datasets are presented, showing that
the proposed approach significantly outperforms state-of-the-art hashing
methods for retrieval tasks
Shared Predictive Cross-Modal Deep Quantization
With explosive growth of data volume and ever-increasing diversity of data
modalities, cross-modal similarity search, which conducts nearest neighbor
search across different modalities, has been attracting increasing interest.
This paper presents a deep compact code learning solution for efficient
cross-modal similarity search. Many recent studies have proven that
quantization-based approaches perform generally better than hashing-based
approaches on single-modal similarity search. In this paper, we propose a deep
quantization approach, which is among the early attempts of leveraging deep
neural networks into quantization-based cross-modal similarity search. Our
approach, dubbed shared predictive deep quantization (SPDQ), explicitly
formulates a shared subspace across different modalities and two private
subspaces for individual modalities, and representations in the shared subspace
and the private subspaces are learned simultaneously by embedding them to a
reproducing kernel Hilbert space, where the mean embedding of different
modality distributions can be explicitly compared. In addition, in the shared
subspace, a quantizer is learned to produce the semantics preserving compact
codes with the help of label alignment. Thanks to this novel network
architecture in cooperation with supervised quantization training, SPDQ can
preserve intramodal and intermodal similarities as much as possible and greatly
reduce quantization error. Experiments on two popular benchmarks corroborate
that our approach outperforms state-of-the-art methods
A New Evaluation Protocol and Benchmarking Results for Extendable Cross-media Retrieval
This paper proposes a new evaluation protocol for cross-media retrieval which
better fits the real-word applications. Both image-text and text-image
retrieval modes are considered. Traditionally, class labels in the training and
testing sets are identical. That is, it is usually assumed that the query falls
into some pre-defined classes. However, in practice, the content of a query
image/text may vary extensively, and the retrieval system does not necessarily
know in advance the class label of a query. Considering the inconsistency
between the real-world applications and laboratory assumptions, we think that
the existing protocol that works under identical train/test classes can be
modified and improved.
This work is dedicated to addressing this problem by considering the protocol
under an extendable scenario, \ie, the training and testing classes do not
overlap. We provide extensive benchmarking results obtained by the existing
protocol and the proposed new protocol on several commonly used datasets. We
demonstrate a noticeable performance drop when the testing classes are unseen
during training. Additionally, a trivial solution, \ie, directly using the
predicted class label for cross-media retrieval, is tested. We show that the
trivial solution is very competitive in traditional non-extendable retrieval,
but becomes less so under the new settings. The train/test split, evaluation
code, and benchmarking results are publicly available on our website.Comment: 10 pages, 9 figure
Heat kernel coupling for multiple graph analysis
In this paper, we introduce heat kernel coupling (HKC) as a method of
constructing multimodal spectral geometry on weighted graphs of different size
without vertex-wise bijective correspondence. We show that Laplacian averaging
can be derived as a limit case of HKC, and demonstrate its applications on
several problems from the manifold learning and pattern recognition domain
Supervised Matrix Factorization for Cross-Modality Hashing
Matrix factorization has been recently utilized for the task of multi-modal
hashing for cross-modality visual search, where basis functions are learned to
map data from different modalities to the same Hamming embedding. In this
paper, we propose a novel cross-modality hashing algorithm termed Supervised
Matrix Factorization Hashing (SMFH) which tackles the multi-modal hashing
problem with a collective non-matrix factorization across the different
modalities. In particular, SMFH employs a well-designed binary code learning
algorithm to preserve the similarities among multi-modal original features
through a graph regularization. At the same time, semantic labels, when
available, are incorporated into the learning procedure. We conjecture that all
these would facilitate to preserve the most relevant information during the
binary quantization process, and hence improve the retrieval accuracy. We
demonstrate the superior performance of SMFH on three cross-modality visual
search benchmarks, i.e., the PASCAL-Sentence, Wiki, and NUS-WIDE, with
quantitative comparison to various state-of-the-art methodsComment: 7 pages, 4 figure
Understanding Locally Competitive Networks
Recently proposed neural network activation functions such as rectified
linear, maxout, and local winner-take-all have allowed for faster and more
effective training of deep neural architectures on large and complex datasets.
The common trait among these functions is that they implement local competition
between small groups of computational units within a layer, so that only part
of the network is activated for any given input pattern. In this paper, we
attempt to visualize and understand this self-modularization, and suggest a
unified explanation for the beneficial properties of such networks. We also
show how our insights can be directly useful for efficiently performing
retrieval over large datasets using neural networks.Comment: 9 pages + 2 supplementary, Accepted to ICLR 2015 Conference trac
Correlation Hashing Network for Efficient Cross-Modal Retrieval
Hashing is widely applied to approximate nearest neighbor search for
large-scale multimodal retrieval with storage and computation efficiency.
Cross-modal hashing improves the quality of hash coding by exploiting semantic
correlations across different modalities. Existing cross-modal hashing methods
first transform data into low-dimensional feature vectors, and then generate
binary codes by another separate quantization step. However, suboptimal hash
codes may be generated since the quantization error is not explicitly minimized
and the feature representation is not jointly optimized with the binary codes.
This paper presents a Correlation Hashing Network (CHN) approach to cross-modal
hashing, which jointly learns good data representation tailored to hash coding
and formally controls the quantization error. The proposed CHN is a hybrid deep
architecture that constitutes a convolutional neural network for learning good
image representations, a multilayer perception for learning good text
representations, two hashing layers for generating compact binary codes, and a
structured max-margin loss that integrates all things together to enable
learning similarity-preserving and high-quality hash codes. Extensive empirical
study shows that CHN yields state of the art cross-modal retrieval performance
on standard benchmarks.Comment: 7 page
Set-to-Set Hashing with Applications in Visual Recognition
Visual data, such as an image or a sequence of video frames, is often
naturally represented as a point set. In this paper, we consider the
fundamental problem of finding a nearest set from a collection of sets, to a
query set. This problem has obvious applications in large-scale visual
retrieval and recognition, and also in applied fields beyond computer vision.
One challenge stands out in solving the problem---set representation and
measure of similarity. Particularly, the query set and the sets in dataset
collection can have varying cardinalities. The training collection is large
enough such that linear scan is impractical. We propose a simple representation
scheme that encodes both statistical and structural information of the sets.
The derived representations are integrated in a kernel framework for flexible
similarity measurement. For the query set process, we adopt a learning-to-hash
pipeline that turns the kernel representations into hash bits based on simple
learners, using multiple kernel learning. Experiments on two visual retrieval
datasets show unambiguously that our set-to-set hashing framework outperforms
prior methods that do not take the set-to-set search setting.Comment: 9 page
HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval
As the rapid growth of multi-modal data, hashing methods for cross-modal
retrieval have received considerable attention. Deep-networks-based cross-modal
hashing methods are appealing as they can integrate feature learning and hash
coding into end-to-end trainable frameworks. However, it is still challenging
to find content similarities between different modalities of data due to the
heterogeneity gap. To further address this problem, we propose an adversarial
hashing network with attention mechanism to enhance the measurement of content
similarities by selectively focusing on informative parts of multi-modal data.
The proposed new adversarial network, HashGAN, consists of three building
blocks: 1) the feature learning module to obtain feature representations, 2)
the generative attention module to generate an attention mask, which is used to
obtain the attended (foreground) and the unattended (background) feature
representations, 3) the discriminative hash coding module to learn hash
functions that preserve the similarities between different modalities. In our
framework, the generative module and the discriminative module are trained in
an adversarial way: the generator is learned to make the discriminator cannot
preserve the similarities of multi-modal data w.r.t. the background feature
representations, while the discriminator aims to preserve the similarities of
multi-modal data w.r.t. both the foreground and the background feature
representations. Extensive evaluations on several benchmark datasets
demonstrate that the proposed HashGAN brings substantial improvements over
other state-of-the-art cross-modal hashing methods.Comment: 10 pages, 8 figures, 3 table