3,579 research outputs found
SADIH: Semantic-Aware DIscrete Hashing
Due to its low storage cost and fast query speed, hashing has been recognized
to accomplish similarity search in large-scale multimedia retrieval
applications. Particularly supervised hashing has recently received
considerable research attention by leveraging the label information to preserve
the pairwise similarities of data points in the Hamming space. However, there
still remain two crucial bottlenecks: 1) the learning process of the full
pairwise similarity preservation is computationally unaffordable and unscalable
to deal with big data; 2) the available category information of data are not
well-explored to learn discriminative hash functions. To overcome these
challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH)
framework, which aims to directly embed the transformed semantic information
into the asymmetric similarity approximation and discriminative hashing
function learning. Specifically, a semantic-aware latent embedding is
introduced to asymmetrically preserve the full pairwise similarities while
skillfully handle the cumbersome n times n pairwise similarity matrix.
Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the
data structures in the discriminative latent semantic space and perform data
reconstruction. Moreover, an efficient alternating optimization algorithm is
proposed to solve the resulting discrete optimization problem. Extensive
experimental results on multiple large-scale datasets demonstrate that our
SADIH can clearly outperform the state-of-the-art baselines with the additional
benefit of lower computational costs.Comment: Accepted by The Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search
Mobile landmark search (MLS) recently receives increasing attention for its
great practical values. However, it still remains unsolved due to two important
challenges. One is high bandwidth consumption of query transmission, and the
other is the huge visual variations of query images sent from mobile devices.
In this paper, we propose a novel hashing scheme, named as canonical view based
discrete multi-modal hashing (CV-DMH), to handle these problems via a novel
three-stage learning procedure. First, a submodular function is designed to
measure visual representativeness and redundancy of a view set. With it,
canonical views, which capture key visual appearances of landmark with limited
redundancy, are efficiently discovered with an iterative mining strategy.
Second, multi-modal sparse coding is applied to transform visual features from
multiple modalities into an intermediate representation. It can robustly and
adaptively characterize visual contents of varied landmark images with certain
canonical views. Finally, compact binary codes are learned on intermediate
representation within a tailored discrete binary embedding model which
preserves visual relations of images measured with canonical views and removes
the involved noises. In this part, we develop a new augmented Lagrangian
multiplier (ALM) based optimization method to directly solve the discrete
binary codes. We can not only explicitly deal with the discrete constraint, but
also consider the bit-uncorrelated constraint and balance constraint together.
Experiments on real world landmark datasets demonstrate the superior
performance of CV-DMH over several state-of-the-art methods
Video retrieval based on deep convolutional neural network
Recently, with the enormous growth of online videos, fast video retrieval
research has received increasing attention. As an extension of image hashing
techniques, traditional video hashing methods mainly depend on hand-crafted
features and transform the real-valued features into binary hash codes. As
videos provide far more diverse and complex visual information than images,
extracting features from videos is much more challenging than that from images.
Therefore, high-level semantic features to represent videos are needed rather
than low-level hand-crafted methods. In this paper, a deep convolutional neural
network is proposed to extract high-level semantic features and a binary hash
function is then integrated into this framework to achieve an end-to-end
optimization. Particularly, our approach also combines triplet loss function
which preserves the relative similarity and difference of videos and
classification loss function as the optimization objective. Experiments have
been performed on two public datasets and the results demonstrate the
superiority of our proposed method compared with other state-of-the-art video
retrieval methods
- …