1,122 research outputs found
Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval
Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal
retrieval task for searching natural images given free-hand sketches under the
zero-shot scenario. Most existing methods solve this problem by simultaneously
projecting visual features and semantic supervision into a low-dimensional
common space for efficient retrieval. However, such low-dimensional projection
destroys the completeness of semantic knowledge in original semantic space, so
that it is unable to transfer useful knowledge well when learning semantic from
different modalities. Moreover, the domain information and semantic information
are entangled in visual features, which is not conducive for cross-modal
matching since it will hinder the reduction of domain gap between sketch and
image. In this paper, we propose a Progressive Domain-independent Feature
Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of
original semantic knowledge, PDFD decomposes visual features into domain
features and semantic ones, and then the semantic features are projected into
common space as retrieval features for ZS-SBIR. The progressive projection
strategy maintains strong semantic supervision. Besides, to guarantee the
retrieval features to capture clean and complete semantic information, the
cross-reconstruction loss is introduced to encourage that any combinations of
retrieval features and domain features can reconstruct the visual features.
Extensive experiments demonstrate the superiority of our PDFD over
state-of-the-art competitors
ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval
The huge domain gap between sketches and photos and the highly abstract
sketch representations pose challenges for sketch-based image retrieval
(\underline{SBIR}). The zero-shot sketch-based image retrieval
(\underline{ZS-SBIR}) is more generic and practical but poses an even greater
challenge because of the additional knowledge gap between the seen and unseen
categories. To simultaneously mitigate both gaps, we propose an
\textbf{A}pproaching-and-\textbf{C}entralizing \textbf{Net}work (termed
"\textbf{ACNet}") to jointly optimize sketch-to-photo synthesis and the image
retrieval. The retrieval module guides the synthesis module to generate large
amounts of diverse photo-like images which gradually approach the photo domain,
and thus better serve the retrieval module than ever to learn domain-agnostic
representations and category-agnostic common knowledge for generalizing to
unseen categories. These diverse images generated with retrieval guidance can
effectively alleviate the overfitting problem troubling concrete
category-specific training samples with high gradients. We also discover the
use of proxy-based NormSoftmax loss is effective in the zero-shot setting
because its centralizing effect can stabilize our joint training and promote
the generalization ability to unseen categories. Our approach is simple yet
effective, which achieves state-of-the-art performance on two widely used
ZS-SBIR datasets and surpasses previous methods by a large margin.Comment: the paper is under consideration at IEEE Transactions on Circuits and
Systems for Video Technolog
Fine-Grained Image Analysis with Deep Learning: A Survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem
in computer vision and pattern recognition, and underpins a diverse set of
real-world applications. The task of FGIA targets analyzing visual objects from
subordinate categories, e.g., species of birds or models of cars. The small
inter-class and large intra-class variation inherent to fine-grained image
analysis makes it a challenging problem. Capitalizing on advances in deep
learning, in recent years we have witnessed remarkable progress in deep
learning powered FGIA. In this paper we present a systematic survey of these
advances, where we attempt to re-define and broaden the field of FGIA by
consolidating two fundamental fine-grained research areas -- fine-grained image
recognition and fine-grained image retrieval. In addition, we also review other
key issues of FGIA, such as publicly available benchmark datasets and related
domain-specific applications. We conclude by highlighting several research
directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM
Deep Learning for Free-Hand Sketch: A Survey
Free-hand sketches are highly illustrative, and have been widely used by
humans to depict objects or stories from ancient times to the present. The
recent prevalence of touchscreen devices has made sketch creation a much easier
task than ever and consequently made sketch-oriented applications increasingly
popular. The progress of deep learning has immensely benefited free-hand sketch
research and applications. This paper presents a comprehensive survey of the
deep learning techniques oriented at free-hand sketch data, and the
applications that they enable. The main contents of this survey include: (i) A
discussion of the intrinsic traits and unique challenges of free-hand sketch,
to highlight the essential differences between sketch data and other data
modalities, e.g., natural photos. (ii) A review of the developments of
free-hand sketch research in the deep learning era, by surveying existing
datasets, research topics, and the state-of-the-art methods through a detailed
taxonomy and experimental evaluation. (iii) Promotion of future work via a
discussion of bottlenecks, open problems, and potential research directions for
the community.Comment: This paper is accepted by IEEE TPAM
Delving Deep into the Sketch and Photo Relation
"Sketches drawn by humans can play a similar role to photos in terms of conveying shape, posture as well as fine-grained information, and this fact has stimulated one line of cross-domain research that is related to sketch and photo, including sketch-based photo synthesis and retrieval. In this thesis, we aim to further investigate the relationship between sketch and photo. More specifically, we study certain under- explored traits in this relationship, and propose novel applications to reinforce the understanding of sketch and photo relation.Our exploration starts with the problem of sketch-based photo synthesis, where the unique trait of non-rigid alignment between sketch and photo is overlooked in existing research. We then carry on with our investigation from a new angle to study whether sketch can facilitate photo classifier generation. Building upon this, we continue to explore how sketch and photo are linked together on a more fine-grained level by tackling with the sketch-based photo segmenter prediction. Furthermore, we address the data scarcity issue identified in nearly all sketch-photo-related applications by examining their inherent correlation in the semantic aspect using sketch-based image retrieval (SBIR) as a test-bed. In general, we make four main contributions to the research on relationship between sketch and photo.Firstly, to mitigate the effect of deformation in sketch-based photo synthesis, we introduce the spatial transformer network to our image-image regression framework, which subtly deals with non-rigid alignment between the sketches and photos. The qualitative and quantitative experiments consistently reveal the superior quality of our synthesised photos over those generated by existing approaches.Secondly, sketch-based photo classifier generation is achieved with a novel model regression network, which maps the sketch to the parameters of photo classification model. It is shown that our model regression network is able to generalise across categories and photo classifiers for novel classes not involved in training are just a sketch away. Comprehensive experiments illustrate the promising performance of the generated binary and multi-class photo classifiers, and demonstrate that sketches can also be employed to enhance the granularity of existing photo classifiers.Thirdly, to achieve the goal of sketch-based photo segmentation, we propose a photo segmentation model generation algorithm that predicts the weights of a deep photo segmentation network according to the input sketch. The results confirm that one single sketch is the only prerequisite for unseen category photo segmentation, and the segmentation performance can be further improved by utilising sketch that is aligned with the object to be segmented in shape and position.Finally, we present an unsupervised representation learning framework for SBIR, the purpose of which is to eliminate the barrier imposed by data annotation scarcity. Prototype and memory bank reinforced joint distribution optimal transport is integrated into the unsupervised representation learning framework, so that the mapping between the sketches and photos could be automatically detected to learn a semantically meaningful yet domain-agnostic feature space. Extensive experiments and feature visualisation validate the efficacy of our proposed algorithm.
Deep Learning for Single Image Super-Resolution: A Brief Review
Single image super-resolution (SISR) is a notoriously challenging ill-posed
problem, which aims to obtain a high-resolution (HR) output from one of its
low-resolution (LR) versions. To solve the SISR problem, recently powerful deep
learning algorithms have been employed and achieved the state-of-the-art
performance. In this survey, we review representative deep learning-based SISR
methods, and group them into two categories according to their major
contributions to two essential aspects of SISR: the exploration of efficient
neural network architectures for SISR, and the development of effective
optimization objectives for deep SISR learning. For each category, a baseline
is firstly established and several critical limitations of the baseline are
summarized. Then representative works on overcoming these limitations are
presented based on their original contents as well as our critical
understandings and analyses, and relevant comparisons are conducted from a
variety of perspectives. Finally we conclude this review with some vital
current challenges and future trends in SISR leveraging deep learning
algorithms.Comment: Accepted by IEEE Transactions on Multimedia (TMM
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
- …