46,407 research outputs found
Attribute-Guided Network for Cross-Modal Zero-Shot Hashing
Zero-Shot Hashing aims at learning a hashing model that is trained only by
instances from seen categories but can generate well to those of unseen
categories. Typically, it is achieved by utilizing a semantic embedding space
to transfer knowledge from seen domain to unseen domain. Existing efforts
mainly focus on single-modal retrieval task, especially Image-Based Image
Retrieval (IBIR). However, as a highlighted research topic in the field of
hashing, cross-modal retrieval is more common in real world applications. To
address the Cross-Modal Zero-Shot Hashing (CMZSH) retrieval task, we propose a
novel Attribute-Guided Network (AgNet), which can perform not only IBIR, but
also Text-Based Image Retrieval (TBIR). In particular, AgNet aligns different
modal data into a semantically rich attribute space, which bridges the gap
caused by modality heterogeneity and zero-shot setting. We also design an
effective strategy that exploits the attribute to guide the generation of hash
codes for image and text within the same network. Extensive experimental
results on three benchmark datasets (AwA, SUN, and ImageNet) demonstrate the
superiority of AgNet on both cross-modal and single-modal zero-shot image
retrieval tasks.Comment: 9 pages, 8 figure
Learning to Hash for Indexing Big Data - A Survey
The explosive growth in big data has attracted much attention in designing
efficient indexing and search methods recently. In many critical applications
such as large-scale search and pattern matching, finding the nearest neighbors
to a query is a fundamental research problem. However, the straightforward
solution using exhaustive comparison is infeasible due to the prohibitive
computational complexity and memory requirement. In response, Approximate
Nearest Neighbor (ANN) search based on hashing techniques has become popular
due to its promising performance in both efficiency and accuracy. Prior
randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore
data-independent hash functions with random projections or permutations.
Although having elegant theoretic guarantees on the search quality in certain
metric spaces, performance of randomized hashing has been shown insufficient in
many real-world applications. As a remedy, new approaches incorporating
data-driven learning methods in development of advanced hash functions have
emerged. Such learning to hash methods exploit information such as data
distributions or class labels when optimizing the hash codes or functions.
Importantly, the learned hash codes are able to preserve the proximity of
neighboring data in the original feature spaces in the hash code spaces. The
goal of this paper is to provide readers with systematic understanding of
insights, pros and cons of the emerging techniques. We provide a comprehensive
survey of the learning to hash framework and representative techniques of
various types, including unsupervised, semi-supervised, and supervised. In
addition, we also summarize recent hashing approaches utilizing the deep
learning models. Finally, we discuss the future direction and trends of
research in this area
From Visual Attributes to Adjectives through Decompositional Distributional Semantics
As automated image analysis progresses, there is increasing interest in
richer linguistic annotation of pictures, with attributes of objects (e.g.,
furry, brown...) attracting most attention. By building on the recent
"zero-shot learning" approach, and paying attention to the linguistic nature of
attributes as noun modifiers, and specifically adjectives, we show that it is
possible to tag images with attribute-denoting adjectives even when no training
data containing the relevant annotation are available. Our approach relies on
two key observations. First, objects can be seen as bundles of attributes,
typically expressed as adjectival modifiers (a dog is something furry, brown,
etc.), and thus a function trained to map visual representations of objects to
nominal labels can implicitly learn to map attributes to adjectives. Second,
objects and attributes come together in pictures (the same thing is a dog and
it is brown). We can thus achieve better attribute (and object) label retrieval
by treating images as "visual phrases", and decomposing their linguistic
representation into an attribute-denoting adjective and an object-denoting
noun. Our approach performs comparably to a method exploiting manual attribute
annotation, it outperforms various competitive alternatives in both attribute
and object annotation, and it automatically constructs attribute-centric
representations that significantly improve performance in supervised object
recognition.Comment: accepted at Transactions of the Association for Computational
Linguistics (TACL), 3/201
Weakly Supervised Video Moment Retrieval From Text Queries
There have been a few recent methods proposed in text to video moment
retrieval using natural language queries, but requiring full supervision during
training. However, acquiring a large number of training videos with temporal
boundary annotations for each text description is extremely time-consuming and
often not scalable. In order to cope with this issue, in this work, we
introduce the problem of learning from weak labels for the task of text to
video moment retrieval. The weak nature of the supervision is because, during
training, we only have access to the video-text pairs rather than the temporal
extent of the video to which different text descriptions relate. We propose a
joint visual-semantic embedding based framework that learns the notion of
relevant segments from video using only video-level sentence descriptions.
Specifically, our main idea is to utilize latent alignment between video frames
and sentence descriptions using Text-Guided Attention (TGA). TGA is then used
during the test phase to retrieve relevant moments. Experiments on two
benchmark datasets demonstrate that our method achieves comparable performance
to state-of-the-art fully supervised approaches.Comment: Revised Table 1 in Page 6, A small bug related to rounding resulted
in a slightly improved score in the previous version. Our conclusion remains
the same after the updat
Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss
Deep supervised hashing has emerged as an influential solution to large-scale
semantic image retrieval problems in computer vision. In the light of recent
progress, convolutional neural network based hashing methods typically seek
pair-wise or triplet labels to conduct the similarity preserving learning.
However, complex semantic concepts of visual contents are hard to capture by
similar/dissimilar labels, which limits the retrieval performance. Generally,
pair-wise or triplet losses not only suffer from expensive training costs but
also lack in extracting sufficient semantic information. In this regard, we
propose a novel deep supervised hashing model to learn more compact class-level
similarity preserving binary codes. Our deep learning based model is motivated
by deep metric learning that directly takes semantic labels as supervised
information in training and generates corresponding discriminant hashing code.
Specifically, a novel cubic constraint loss function based on Gaussian
distribution is proposed, which preserves semantic variations while penalizes
the overlap part of different classes in the embedding space. To address the
discrete optimization problem introduced by binary codes, a two-step
optimization strategy is proposed to provide efficient training and avoid the
problem of gradient vanishing. Extensive experiments on four large-scale
benchmark databases show that our model can achieve the state-of-the-art
retrieval performance. Moreover, when training samples are limited, our method
surpasses other supervised deep hashing methods with non-negligible margins
Cross-modal Subspace Learning via Kernel Correlation Maximization and Discriminative Structure Preserving
The measure between heterogeneous data is still an open problem. Many
research works have been developed to learn a common subspace where the
similarity between different modalities can be calculated directly. However,
most of existing works focus on learning a latent subspace but the semantically
structural information is not well preserved. Thus, these approaches cannot get
desired results. In this paper, we propose a novel framework, termed
Cross-modal subspace learning via Kernel correlation maximization and
Discriminative structure-preserving (CKD), to solve this problem in two
aspects. Firstly, we construct a shared semantic graph to make each modality
data preserve the neighbor relationship semantically. Secondly, we introduce
the Hilbert-Schmidt Independence Criteria (HSIC) to ensure the consistency
between feature-similarity and semantic-similarity of samples. Our model not
only considers the inter-modality correlation by maximizing the kernel
correlation but also preserves the semantically structural information within
each modality. The extensive experiments are performed to evaluate the proposed
framework on the three public datasets. The experimental results demonstrated
that the proposed CKD is competitive compared with the classic subspace
learning methods.Comment: The paper is under consideration at Multimedia Tools and Application
Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks
This paper presents a simple yet effective supervised deep hash approach that
constructs binary hash codes from labeled data for large-scale image search. We
assume that the semantic labels are governed by several latent attributes with
each attribute on or off, and classification relies on these attributes. Based
on this assumption, our approach, dubbed supervised semantics-preserving deep
hashing (SSDH), constructs hash functions as a latent layer in a deep network
and the binary codes are learned by minimizing an objective function defined
over classification error and other desirable hash codes properties. With this
design, SSDH has a nice characteristic that classification and retrieval are
unified in a single learning model. Moreover, SSDH performs joint learning of
image representations, hash codes, and classification in a point-wised manner,
and thus is scalable to large-scale datasets. SSDH is simple and can be
realized by a slight enhancement of an existing deep architecture for
classification; yet it is effective and outperforms other hashing approaches on
several benchmarks and large datasets. Compared with state-of-the-art
approaches, SSDH achieves higher retrieval accuracy, while the classification
performance is not sacrificed.Comment: To appear in IEEE Trans. Pattern Analysis and Machine Intelligenc
Hashing with Mutual Information
Binary vector embeddings enable fast nearest neighbor retrieval in large
databases of high-dimensional objects, and play an important role in many
practical applications, such as image and video retrieval. We study the problem
of learning binary vector embeddings under a supervised setting, also known as
hashing. We propose a novel supervised hashing method based on optimizing an
information-theoretic quantity: mutual information. We show that optimizing
mutual information can reduce ambiguity in the induced neighborhood structure
in the learned Hamming space, which is essential in obtaining high retrieval
performance. To this end, we optimize mutual information in deep neural
networks with minibatch stochastic gradient descent, with a formulation that
maximally and efficiently utilizes available supervision. Experiments on four
image retrieval benchmarks, including ImageNet, confirm the effectiveness of
our method in learning high-quality binary embeddings for nearest neighbor
retrieval
Deep Discrete Supervised Hashing
Hashing has been widely used for large-scale search due to its low storage
cost and fast query speed. By using supervised information, supervised hashing
can significantly outperform unsupervised hashing. Recently, discrete
supervised hashing and deep hashing are two representative progresses in
supervised hashing. On one hand, hashing is essentially a discrete optimization
problem. Hence, utilizing supervised information to directly guide discrete
(binary) coding procedure can avoid sub-optimal solution and improve the
accuracy. On the other hand, deep hashing, which integrates deep feature
learning and hash-code learning into an end-to-end architecture, can enhance
the feedback between feature learning and hash-code learning. The key in
discrete supervised hashing is to adopt supervised information to directly
guide the discrete coding procedure in hashing. The key in deep hashing is to
adopt the supervised information to directly guide the deep feature learning
procedure. However, there have not existed works which can use the supervised
information to directly guide both discrete coding procedure and deep feature
learning procedure in the same framework. In this paper, we propose a novel
deep hashing method, called deep discrete supervised hashing (DDSH), to address
this problem. DDSH is the first deep hashing method which can utilize
supervised information to directly guide both discrete coding procedure and
deep feature learning procedure, and thus enhance the feedback between these
two important procedures. Experiments on three real datasets show that DDSH can
outperform other state-of-the-art baselines, including both discrete hashing
and deep hashing baselines, for image retrieval
Deep Ordinal Hashing with Spatial Attention
Hashing has attracted increasing research attentions in recent years due to
its high efficiency of computation and storage in image retrieval. Recent works
have demonstrated the superiority of simultaneous feature representations and
hash functions learning with deep neural networks. However, most existing deep
hashing methods directly learn the hash functions by encoding the global
semantic information, while ignoring the local spatial information of images.
The loss of local spatial structure makes the performance bottleneck of hash
functions, therefore limiting its application for accurate similarity
retrieval. In this work, we propose a novel Deep Ordinal Hashing (DOH) method,
which learns ordinal representations by leveraging the ranking structure of
feature space from both local and global views. In particular, to effectively
build the ranking structure, we propose to learn the rank correlation space by
exploiting the local spatial information from Fully Convolutional Network (FCN)
and the global semantic information from the Convolutional Neural Network (CNN)
simultaneously. More specifically, an effective spatial attention model is
designed to capture the local spatial information by selectively learning
well-specified locations closely related to target objects. In such hashing
framework,the local spatial and global semantic nature of images are captured
in an end-to-end ranking-to-hashing manner. Experimental results conducted on
three widely-used datasets demonstrate that the proposed DOH method
significantly outperforms the state-of-the-art hashing methods
- …