699 research outputs found
Unsupervised Multi-modal Hashing for Cross-modal retrieval
With the advantage of low storage cost and high efficiency, hashing learning
has received much attention in the domain of Big Data. In this paper, we
propose a novel unsupervised hashing learning method to cope with this open
problem to directly preserve the manifold structure by hashing. To address this
problem, both the semantic correlation in textual space and the locally
geometric structure in the visual space are explored simultaneously in our
framework. Besides, the `2;1-norm constraint is imposed on the projection
matrices to learn the discriminative hash function for each modality. Extensive
experiments are performed to evaluate the proposed method on the three publicly
available datasets and the experimental results show that our method can
achieve superior performance over the state-of-the-art methods.Comment: 4 pages, 4 figure
Graph based manifold regularized deep neural networks for automatic speech recognition
Deep neural networks (DNNs) have been successfully applied to a wide variety
of acoustic modeling tasks in recent years. These include the applications of
DNNs either in a discriminative feature extraction or in a hybrid acoustic
modeling scenario. Despite the rapid progress in this area, a number of
challenges remain in training DNNs. This paper presents an effective way of
training DNNs using a manifold learning based regularization framework. In this
framework, the parameters of the network are optimized to preserve underlying
manifold based relationships between speech feature vectors while minimizing a
measure of loss between network outputs and targets. This is achieved by
incorporating manifold based locality constraints in the objective criterion of
DNNs. Empirical evidence is provided to demonstrate that training a network
with manifold constraints preserves structural compactness in the hidden layers
of the network. Manifold regularization is applied to train bottleneck DNNs for
feature extraction in hidden Markov model (HMM) based speech recognition. The
experiments in this work are conducted on the Aurora-2 spoken digits and the
Aurora-4 read news large vocabulary continuous speech recognition tasks. The
performance is measured in terms of word error rate (WER) on these tasks. It is
shown that the manifold regularized DNNs result in up to 37% reduction in WER
relative to standard DNNs.Comment: 12 pages including citations, 2 figure
SADIH: Semantic-Aware DIscrete Hashing
Due to its low storage cost and fast query speed, hashing has been recognized
to accomplish similarity search in large-scale multimedia retrieval
applications. Particularly supervised hashing has recently received
considerable research attention by leveraging the label information to preserve
the pairwise similarities of data points in the Hamming space. However, there
still remain two crucial bottlenecks: 1) the learning process of the full
pairwise similarity preservation is computationally unaffordable and unscalable
to deal with big data; 2) the available category information of data are not
well-explored to learn discriminative hash functions. To overcome these
challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH)
framework, which aims to directly embed the transformed semantic information
into the asymmetric similarity approximation and discriminative hashing
function learning. Specifically, a semantic-aware latent embedding is
introduced to asymmetrically preserve the full pairwise similarities while
skillfully handle the cumbersome n times n pairwise similarity matrix.
Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the
data structures in the discriminative latent semantic space and perform data
reconstruction. Moreover, an efficient alternating optimization algorithm is
proposed to solve the resulting discrete optimization problem. Extensive
experimental results on multiple large-scale datasets demonstrate that our
SADIH can clearly outperform the state-of-the-art baselines with the additional
benefit of lower computational costs.Comment: Accepted by The Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Learning to Hash for Indexing Big Data - A Survey
The explosive growth in big data has attracted much attention in designing
efficient indexing and search methods recently. In many critical applications
such as large-scale search and pattern matching, finding the nearest neighbors
to a query is a fundamental research problem. However, the straightforward
solution using exhaustive comparison is infeasible due to the prohibitive
computational complexity and memory requirement. In response, Approximate
Nearest Neighbor (ANN) search based on hashing techniques has become popular
due to its promising performance in both efficiency and accuracy. Prior
randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore
data-independent hash functions with random projections or permutations.
Although having elegant theoretic guarantees on the search quality in certain
metric spaces, performance of randomized hashing has been shown insufficient in
many real-world applications. As a remedy, new approaches incorporating
data-driven learning methods in development of advanced hash functions have
emerged. Such learning to hash methods exploit information such as data
distributions or class labels when optimizing the hash codes or functions.
Importantly, the learned hash codes are able to preserve the proximity of
neighboring data in the original feature spaces in the hash code spaces. The
goal of this paper is to provide readers with systematic understanding of
insights, pros and cons of the emerging techniques. We provide a comprehensive
survey of the learning to hash framework and representative techniques of
various types, including unsupervised, semi-supervised, and supervised. In
addition, we also summarize recent hashing approaches utilizing the deep
learning models. Finally, we discuss the future direction and trends of
research in this area
First-Take-All: Temporal Order-Preserving Hashing for 3D Action Videos
With the prevalence of the commodity depth cameras, the new paradigm of user
interfaces based on 3D motion capturing and recognition have dramatically
changed the way of interactions between human and computers. Human action
recognition, as one of the key components in these devices, plays an important
role to guarantee the quality of user experience. Although the model-driven
methods have achieved huge success, they cannot provide a scalable solution for
efficiently storing, retrieving and recognizing actions in the large-scale
applications. These models are also vulnerable to the temporal translation and
warping, as well as the variations in motion scales and execution rates. To
address these challenges, we propose to treat the 3D human action recognition
as a video-level hashing problem and propose a novel First-Take-All (FTA)
Hashing algorithm capable of hashing the entire video into hash codes of fixed
length. We demonstrate that this FTA algorithm produces a compact
representation of the video invariant to the above mentioned variations,
through which action recognition can be solved by an efficient nearest neighbor
search by the Hamming distance between the FTA hash codes. Experiments on the
public 3D human action datasets shows that the FTA algorithm can reach a
recognition accuracy higher than 80%, with about 15 bits per frame considering
there are 65 frames per video over the datasets.Comment: 9 pages, 11 figure
Discriminative Supervised Hashing for Cross-Modal similarity Search
With the advantage of low storage cost and high retrieval efficiency, hashing
techniques have recently been an emerging topic in cross-modal similarity
search. As multiple modal data reflect similar semantic content, many
researches aim at learning unified binary codes. However, discriminative
hashing features learned by these methods are not adequate. This results in
lower accuracy and robustness. We propose a novel hashing learning framework
which jointly performs classifier learning, subspace learning and matrix
factorization to preserve class-specific semantic content, termed
Discriminative Supervised Hashing (DSH), to learn the discrimative unified
binary codes for multi-modal data. Besides, reducing the loss of information
and preserving the non-linear structure of data, DSH non-linearly projects
different modalities into the common space in which the similarity among
heterogeneous data points can be measured. Extensive experiments conducted on
the three publicly available datasets demonstrate that the framework proposed
in this paper outperforms several state-of -the-art methods.Comment: 7 pages,3 figures,4 tables;The paper is under consideration at Image
and Vision Computin
Deep LDA Hashing
The conventional supervised hashing methods based on classification do not
entirely meet the requirements of hashing technique, but Linear Discriminant
Analysis (LDA) does. In this paper, we propose to perform a revised LDA
objective over deep networks to learn efficient hashing codes in a truly
end-to-end fashion. However, the complicated eigenvalue decomposition within
each mini-batch in every epoch has to be faced with when simply optimizing the
deep network w.r.t. the LDA objective. In this work, the revised LDA objective
is transformed into a simple least square problem, which naturally overcomes
the intractable problems and can be easily solved by the off-the-shelf
optimizer. Such deep extension can also overcome the weakness of LDA Hashing in
the limited linear projection and feature learning. Amounts of experiments are
conducted on three benchmark datasets. The proposed Deep LDA Hashing shows
nearly 70 points improvement over the conventional one on the CIFAR-10 dataset.
It also beats several state-of-the-art methods on various metrics.Comment: 10 pages, 3 figure
Set-to-Set Hashing with Applications in Visual Recognition
Visual data, such as an image or a sequence of video frames, is often
naturally represented as a point set. In this paper, we consider the
fundamental problem of finding a nearest set from a collection of sets, to a
query set. This problem has obvious applications in large-scale visual
retrieval and recognition, and also in applied fields beyond computer vision.
One challenge stands out in solving the problem---set representation and
measure of similarity. Particularly, the query set and the sets in dataset
collection can have varying cardinalities. The training collection is large
enough such that linear scan is impractical. We propose a simple representation
scheme that encodes both statistical and structural information of the sets.
The derived representations are integrated in a kernel framework for flexible
similarity measurement. For the query set process, we adopt a learning-to-hash
pipeline that turns the kernel representations into hash bits based on simple
learners, using multiple kernel learning. Experiments on two visual retrieval
datasets show unambiguously that our set-to-set hashing framework outperforms
prior methods that do not take the set-to-set search setting.Comment: 9 page
Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search
Hash based nearest neighbor search has become attractive in many
applications. However, the quantization in hashing usually degenerates the
discriminative power when using Hamming distance ranking. Besides, for
large-scale visual search, existing hashing methods cannot directly support the
efficient search over the data with multiple sources, and while the literature
has shown that adaptively incorporating complementary information from diverse
sources or views can significantly boost the search performance. To address the
problems, this paper proposes a novel and generic approach to building multiple
hash tables with multiple views and generating fine-grained ranking results at
bitwise and tablewise levels. For each hash table, a query-adaptive bitwise
weighting is introduced to alleviate the quantization loss by simultaneously
exploiting the quality of hash functions and their complement for nearest
neighbor search. From the tablewise aspect, multiple hash tables are built for
different data views as a joint index, over which a query-specific rank fusion
is proposed to rerank all results from the bitwise ranking by diffusing in a
graph. Comprehensive experiments on image search over three well-known
benchmarks show that the proposed method achieves up to 17.11% and 20.28%
performance gains on single and multiple table search over state-of-the-art
methods
Deep Ordinal Hashing with Spatial Attention
Hashing has attracted increasing research attentions in recent years due to
its high efficiency of computation and storage in image retrieval. Recent works
have demonstrated the superiority of simultaneous feature representations and
hash functions learning with deep neural networks. However, most existing deep
hashing methods directly learn the hash functions by encoding the global
semantic information, while ignoring the local spatial information of images.
The loss of local spatial structure makes the performance bottleneck of hash
functions, therefore limiting its application for accurate similarity
retrieval. In this work, we propose a novel Deep Ordinal Hashing (DOH) method,
which learns ordinal representations by leveraging the ranking structure of
feature space from both local and global views. In particular, to effectively
build the ranking structure, we propose to learn the rank correlation space by
exploiting the local spatial information from Fully Convolutional Network (FCN)
and the global semantic information from the Convolutional Neural Network (CNN)
simultaneously. More specifically, an effective spatial attention model is
designed to capture the local spatial information by selectively learning
well-specified locations closely related to target objects. In such hashing
framework,the local spatial and global semantic nature of images are captured
in an end-to-end ranking-to-hashing manner. Experimental results conducted on
three widely-used datasets demonstrate that the proposed DOH method
significantly outperforms the state-of-the-art hashing methods
- …