1,009 research outputs found
Zero-Shot Sketch-Image Hashing
Recent studies show that large-scale sketch-based image retrieval (SBIR) can
be efficiently tackled by cross-modal binary representation learning methods,
where Hamming distance matching significantly speeds up the process of
similarity search. Providing training and test data subjected to a fixed set of
pre-defined categories, the cutting-edge SBIR and cross-modal hashing works
obtain acceptable retrieval performance. However, most of the existing methods
fail when the categories of query sketches have never been seen during
training. In this paper, the above problem is briefed as a novel but realistic
zero-shot SBIR hashing task. We elaborate the challenges of this special task
and accordingly propose a zero-shot sketch-image hashing (ZSIH) model. An
end-to-end three-network architecture is built, two of which are treated as the
binary encoders. The third network mitigates the sketch-image heterogeneity and
enhances the semantic relations among data by utilizing the Kronecker fusion
layer and graph convolution, respectively. As an important part of ZSIH, we
formulate a generative hashing scheme in reconstructing semantic knowledge
representations for zero-shot retrieval. To the best of our knowledge, ZSIH is
the first zero-shot hashing work suitable for SBIR and cross-modal search.
Comprehensive experiments are conducted on two extended datasets, i.e., Sketchy
and TU-Berlin with a novel zero-shot train-test split. The proposed model
remarkably outperforms related works.Comment: Accepted as spotlight at CVPR 201
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA Layer (CCAL) allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three different scenarios (text-to-image,
audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when little training data is available (the code for
all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal
of Multimedia Information Retrieva
Deep Learning for Free-Hand Sketch: A Survey
Free-hand sketches are highly illustrative, and have been widely used by
humans to depict objects or stories from ancient times to the present. The
recent prevalence of touchscreen devices has made sketch creation a much easier
task than ever and consequently made sketch-oriented applications increasingly
popular. The progress of deep learning has immensely benefited free-hand sketch
research and applications. This paper presents a comprehensive survey of the
deep learning techniques oriented at free-hand sketch data, and the
applications that they enable. The main contents of this survey include: (i) A
discussion of the intrinsic traits and unique challenges of free-hand sketch,
to highlight the essential differences between sketch data and other data
modalities, e.g., natural photos. (ii) A review of the developments of
free-hand sketch research in the deep learning era, by surveying existing
datasets, research topics, and the state-of-the-art methods through a detailed
taxonomy and experimental evaluation. (iii) Promotion of future work via a
discussion of bottlenecks, open problems, and potential research directions for
the community.Comment: This paper is accepted by IEEE TPAM
Delving Deep into the Sketch and Photo Relation
"Sketches drawn by humans can play a similar role to photos in terms of conveying shape, posture as well as fine-grained information, and this fact has stimulated one line of cross-domain research that is related to sketch and photo, including sketch-based photo synthesis and retrieval. In this thesis, we aim to further investigate the relationship between sketch and photo. More specifically, we study certain under- explored traits in this relationship, and propose novel applications to reinforce the understanding of sketch and photo relation.Our exploration starts with the problem of sketch-based photo synthesis, where the unique trait of non-rigid alignment between sketch and photo is overlooked in existing research. We then carry on with our investigation from a new angle to study whether sketch can facilitate photo classifier generation. Building upon this, we continue to explore how sketch and photo are linked together on a more fine-grained level by tackling with the sketch-based photo segmenter prediction. Furthermore, we address the data scarcity issue identified in nearly all sketch-photo-related applications by examining their inherent correlation in the semantic aspect using sketch-based image retrieval (SBIR) as a test-bed. In general, we make four main contributions to the research on relationship between sketch and photo.Firstly, to mitigate the effect of deformation in sketch-based photo synthesis, we introduce the spatial transformer network to our image-image regression framework, which subtly deals with non-rigid alignment between the sketches and photos. The qualitative and quantitative experiments consistently reveal the superior quality of our synthesised photos over those generated by existing approaches.Secondly, sketch-based photo classifier generation is achieved with a novel model regression network, which maps the sketch to the parameters of photo classification model. It is shown that our model regression network is able to generalise across categories and photo classifiers for novel classes not involved in training are just a sketch away. Comprehensive experiments illustrate the promising performance of the generated binary and multi-class photo classifiers, and demonstrate that sketches can also be employed to enhance the granularity of existing photo classifiers.Thirdly, to achieve the goal of sketch-based photo segmentation, we propose a photo segmentation model generation algorithm that predicts the weights of a deep photo segmentation network according to the input sketch. The results confirm that one single sketch is the only prerequisite for unseen category photo segmentation, and the segmentation performance can be further improved by utilising sketch that is aligned with the object to be segmented in shape and position.Finally, we present an unsupervised representation learning framework for SBIR, the purpose of which is to eliminate the barrier imposed by data annotation scarcity. Prototype and memory bank reinforced joint distribution optimal transport is integrated into the unsupervised representation learning framework, so that the mapping between the sketches and photos could be automatically detected to learn a semantically meaningful yet domain-agnostic feature space. Extensive experiments and feature visualisation validate the efficacy of our proposed algorithm.
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval
The huge domain gap between sketches and photos and the highly abstract
sketch representations pose challenges for sketch-based image retrieval
(\underline{SBIR}). The zero-shot sketch-based image retrieval
(\underline{ZS-SBIR}) is more generic and practical but poses an even greater
challenge because of the additional knowledge gap between the seen and unseen
categories. To simultaneously mitigate both gaps, we propose an
\textbf{A}pproaching-and-\textbf{C}entralizing \textbf{Net}work (termed
"\textbf{ACNet}") to jointly optimize sketch-to-photo synthesis and the image
retrieval. The retrieval module guides the synthesis module to generate large
amounts of diverse photo-like images which gradually approach the photo domain,
and thus better serve the retrieval module than ever to learn domain-agnostic
representations and category-agnostic common knowledge for generalizing to
unseen categories. These diverse images generated with retrieval guidance can
effectively alleviate the overfitting problem troubling concrete
category-specific training samples with high gradients. We also discover the
use of proxy-based NormSoftmax loss is effective in the zero-shot setting
because its centralizing effect can stabilize our joint training and promote
the generalization ability to unseen categories. Our approach is simple yet
effective, which achieves state-of-the-art performance on two widely used
ZS-SBIR datasets and surpasses previous methods by a large margin.Comment: the paper is under consideration at IEEE Transactions on Circuits and
Systems for Video Technolog
Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordZero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.European Union Horizon 202
- …