251 research outputs found
Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordZero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.European Union Horizon 202
Semantically tied paired cycle consistency for any-shot sketch-based image retrieval
This is the final version. Available from the publisher via the DOI in this record. Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to
hand-drawn sketch queries that are rarely seen during the training phase. Related prior works either require aligned sketchimage pairs that are costly to obtain or inefficient memory fusion layer for mapping the visual information to a semantic space.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks, where we introduce
the few-shot setting for SBIR. For solving these tasks, we propose a semantically aligned paired cycle-consistent generative
adversarial network (SEM-PCYC) for any-shot SBIR, where each branch of the generative adversarial network maps the
visual information from sketch and image to a common semantic space via adversarial training. Each of these branches
maintains cycle consistency that only requires supervision at the category level, and avoids the need of aligned sketch-image
pairs. A classification criteria on the generators’ outputs ensures the visual to semantic space mapping to be class-specific.
Furthermore, we propose to combine textual and hierarchical side information via an auto-encoder that selects discriminating
side information within a same end-to-end model. Our results demonstrate a significant boost in any-shot SBIR performance
over the state-of-the-art on the extended version of the challenging Sketchy, TU-Berlin and QuickDraw datasets.European Union: Marie Skłodowska-Curie GrantEuropean Research Council (ERC
Deep Learning for Free-Hand Sketch: A Survey
Free-hand sketches are highly illustrative, and have been widely used by
humans to depict objects or stories from ancient times to the present. The
recent prevalence of touchscreen devices has made sketch creation a much easier
task than ever and consequently made sketch-oriented applications increasingly
popular. The progress of deep learning has immensely benefited free-hand sketch
research and applications. This paper presents a comprehensive survey of the
deep learning techniques oriented at free-hand sketch data, and the
applications that they enable. The main contents of this survey include: (i) A
discussion of the intrinsic traits and unique challenges of free-hand sketch,
to highlight the essential differences between sketch data and other data
modalities, e.g., natural photos. (ii) A review of the developments of
free-hand sketch research in the deep learning era, by surveying existing
datasets, research topics, and the state-of-the-art methods through a detailed
taxonomy and experimental evaluation. (iii) Promotion of future work via a
discussion of bottlenecks, open problems, and potential research directions for
the community.Comment: This paper is accepted by IEEE TPAM
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
In the rapidly advancing field of multi-modal machine learning (MMML), the
convergence of multiple data modalities has the potential to reshape various
applications. This paper presents a comprehensive overview of the current
state, advancements, and challenges of MMML within the sphere of engineering
design. The review begins with a deep dive into five fundamental concepts of
MMML:multi-modal information representation, fusion, alignment, translation,
and co-learning. Following this, we explore the cutting-edge applications of
MMML, placing a particular emphasis on tasks pertinent to engineering design,
such as cross-modal synthesis, multi-modal prediction, and cross-modal
information retrieval. Through this comprehensive overview, we highlight the
inherent challenges in adopting MMML in engineering design, and proffer
potential directions for future research. To spur on the continued evolution of
MMML in engineering design, we advocate for concentrated efforts to construct
extensive multi-modal design datasets, develop effective data-driven MMML
techniques tailored to design applications, and enhance the scalability and
interpretability of MMML models. MMML models, as the next generation of
intelligent design tools, hold a promising future to impact how products are
designed
Open Cross-Domain Visual Search
This paper addresses cross-domain visual search, where visual queries
retrieve category samples from a different domain. For example, we may want to
sketch an airplane and retrieve photographs of airplanes. Despite considerable
progress, the search occurs in a closed setting between two pre-defined
domains. In this paper, we make the step towards an open setting where multiple
visual domains are available. This notably translates into a search between any
pair of domains, from a combination of domains or within multiple domains. We
introduce a simple -- yet effective -- approach. We formulate the search as a
mapping from every visual domain to a common semantic space, where categories
are represented by hyperspherical prototypes. Open cross-domain visual search
is then performed by searching in the common semantic space, regardless of
which domains are used as source or target. Domains are combined in the common
space to search from or within multiple domains simultaneously. A separate
training of every domain-specific mapping function enables an efficient scaling
to any number of domains without affecting the search performance. We
empirically illustrate our capability to perform open cross-domain visual
search in three different scenarios. Our approach is competitive with respect
to existing closed settings, where we obtain state-of-the-art results on
several benchmarks for three sketch-based search tasks.Comment: Accepted at Computer Vision and Image Understanding (CVIU
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Content-Based Search for Deep Generative Models
The growing proliferation of customized and pretrained generative models has
made it infeasible for a user to be fully cognizant of every model in
existence. To address this need, we introduce the task of content-based model
search: given a query and a large set of generative models, finding the models
that best match the query. As each generative model produces a distribution of
images, we formulate the search task as an optimization problem to select the
model with the highest probability of generating similar content as the query.
We introduce a formulation to approximate this probability given the query from
different modalities, e.g., image, sketch, and text. Furthermore, we propose a
contrastive learning framework for model retrieval, which learns to adapt
features for various query modalities. We demonstrate that our method
outperforms several baselines on Generative Model Zoo, a new benchmark we
create for the model retrieval task.Comment: Our project page is hosted at
https://generative-intelligence-lab.github.io/modelverse
- …