2,481 research outputs found
Deep Learning for Free-Hand Sketch: A Survey
Free-hand sketches are highly illustrative, and have been widely used by
humans to depict objects or stories from ancient times to the present. The
recent prevalence of touchscreen devices has made sketch creation a much easier
task than ever and consequently made sketch-oriented applications increasingly
popular. The progress of deep learning has immensely benefited free-hand sketch
research and applications. This paper presents a comprehensive survey of the
deep learning techniques oriented at free-hand sketch data, and the
applications that they enable. The main contents of this survey include: (i) A
discussion of the intrinsic traits and unique challenges of free-hand sketch,
to highlight the essential differences between sketch data and other data
modalities, e.g., natural photos. (ii) A review of the developments of
free-hand sketch research in the deep learning era, by surveying existing
datasets, research topics, and the state-of-the-art methods through a detailed
taxonomy and experimental evaluation. (iii) Promotion of future work via a
discussion of bottlenecks, open problems, and potential research directions for
the community.Comment: This paper is accepted by IEEE TPAM
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
Significant progress has recently been made in creative applications of large
pre-trained models for downstream tasks in 3D vision, such as text-to-shape
generation. This motivates our investigation of how these pre-trained models
can be used effectively to generate 3D shapes from sketches, which has largely
remained an open challenge due to the limited sketch-shape paired datasets and
the varying level of abstraction in the sketches. We discover that conditioning
a 3D generative model on the features (obtained from a frozen large pre-trained
vision model) of synthetic renderings during training enables us to effectively
generate 3D shapes from sketches at inference time. This suggests that the
large pre-trained vision model features carry semantic signals that are
resilient to domain shifts, i.e., allowing us to use only RGB renderings, but
generalizing to sketches at inference time. We conduct a comprehensive set of
experiments investigating different design factors and demonstrate the
effectiveness of our straightforward approach for generation of multiple 3D
shapes per each input sketch regardless of their level of abstraction without
requiring any paired datasets during training
Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting
Self-supervised learning has gained prominence due to its efficacy at
learning powerful representations from unlabelled data that achieve excellent
performance on many challenging downstream tasks. However supervision-free
pre-text tasks are challenging to design and usually modality specific.
Although there is a rich literature of self-supervised methods for either
spatial (such as images) or temporal data (sound or text) modalities, a common
pre-text task that benefits both modalities is largely missing. In this paper,
we are interested in defining a self-supervised pre-text task for sketches and
handwriting data. This data is uniquely characterised by its existence in dual
modalities of rasterized images and vector coordinate sequences. We address and
exploit this dual representation by proposing two novel cross-modal translation
pre-text tasks for self-supervised feature learning: Vectorization and
Rasterization. Vectorization learns to map image space to vector coordinates
and rasterization maps vector coordinates to image space. We show that the our
learned encoder modules benefit both raster-based and vector-based downstream
approaches to analysing hand-drawn data. Empirical evidence shows that our
novel pre-text tasks surpass existing single and multi-modal self-supervision
methods.Comment: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021
Code : https://github.com/AyanKumarBhunia/Self-Supervised-Learning-for-Sketc
Delving Deep into the Sketch and Photo Relation
"Sketches drawn by humans can play a similar role to photos in terms of conveying shape, posture as well as fine-grained information, and this fact has stimulated one line of cross-domain research that is related to sketch and photo, including sketch-based photo synthesis and retrieval. In this thesis, we aim to further investigate the relationship between sketch and photo. More specifically, we study certain under- explored traits in this relationship, and propose novel applications to reinforce the understanding of sketch and photo relation.Our exploration starts with the problem of sketch-based photo synthesis, where the unique trait of non-rigid alignment between sketch and photo is overlooked in existing research. We then carry on with our investigation from a new angle to study whether sketch can facilitate photo classifier generation. Building upon this, we continue to explore how sketch and photo are linked together on a more fine-grained level by tackling with the sketch-based photo segmenter prediction. Furthermore, we address the data scarcity issue identified in nearly all sketch-photo-related applications by examining their inherent correlation in the semantic aspect using sketch-based image retrieval (SBIR) as a test-bed. In general, we make four main contributions to the research on relationship between sketch and photo.Firstly, to mitigate the effect of deformation in sketch-based photo synthesis, we introduce the spatial transformer network to our image-image regression framework, which subtly deals with non-rigid alignment between the sketches and photos. The qualitative and quantitative experiments consistently reveal the superior quality of our synthesised photos over those generated by existing approaches.Secondly, sketch-based photo classifier generation is achieved with a novel model regression network, which maps the sketch to the parameters of photo classification model. It is shown that our model regression network is able to generalise across categories and photo classifiers for novel classes not involved in training are just a sketch away. Comprehensive experiments illustrate the promising performance of the generated binary and multi-class photo classifiers, and demonstrate that sketches can also be employed to enhance the granularity of existing photo classifiers.Thirdly, to achieve the goal of sketch-based photo segmentation, we propose a photo segmentation model generation algorithm that predicts the weights of a deep photo segmentation network according to the input sketch. The results confirm that one single sketch is the only prerequisite for unseen category photo segmentation, and the segmentation performance can be further improved by utilising sketch that is aligned with the object to be segmented in shape and position.Finally, we present an unsupervised representation learning framework for SBIR, the purpose of which is to eliminate the barrier imposed by data annotation scarcity. Prototype and memory bank reinforced joint distribution optimal transport is integrated into the unsupervised representation learning framework, so that the mapping between the sketches and photos could be automatically detected to learn a semantically meaningful yet domain-agnostic feature space. Extensive experiments and feature visualisation validate the efficacy of our proposed algorithm.
SketchBodyNet: A Sketch-Driven Multi-faceted Decoder Network for 3D Human Reconstruction
Reconstructing 3D human shapes from 2D images has received increasing
attention recently due to its fundamental support for many high-level 3D
applications. Compared with natural images, freehand sketches are much more
flexible to depict various shapes, providing a high potential and valuable way
for 3D human reconstruction. However, such a task is highly challenging. The
sparse abstract characteristics of sketches add severe difficulties, such as
arbitrariness, inaccuracy, and lacking image details, to the already badly
ill-posed problem of 2D-to-3D reconstruction. Although current methods have
achieved great success in reconstructing 3D human bodies from a single-view
image, they do not work well on freehand sketches. In this paper, we propose a
novel sketch-driven multi-faceted decoder network termed SketchBodyNet to
address this task. Specifically, the network consists of a backbone and three
separate attention decoder branches, where a multi-head self-attention module
is exploited in each decoder to obtain enhanced features, followed by a
multi-layer perceptron. The multi-faceted decoders aim to predict the camera,
shape, and pose parameters, respectively, which are then associated with the
SMPL model to reconstruct the corresponding 3D human mesh. In learning,
existing 3D meshes are projected via the camera parameters into 2D synthetic
sketches with joints, which are combined with the freehand sketches to optimize
the model. To verify our method, we collect a large-scale dataset of about 26k
freehand sketches and their corresponding 3D meshes containing various poses of
human bodies from 14 different angles. Extensive experimental results
demonstrate our SketchBodyNet achieves superior performance in reconstructing
3D human meshes from freehand sketches.Comment: 9 pages, to appear in Pacific Graphics 202
SENS: Sketch-based Implicit Neural Shape Modeling
We present SENS, a novel method for generating and editing 3D models from
hand-drawn sketches, including those of an abstract nature. Our method allows
users to quickly and easily sketch a shape, and then maps the sketch into the
latent space of a part-aware neural implicit shape architecture. SENS analyzes
the sketch and encodes its parts into ViT patch encoding, then feeds them into
a transformer decoder that converts them to shape embeddings, suitable for
editing 3D neural implicit shapes. SENS not only provides intuitive
sketch-based generation and editing, but also excels in capturing the intent of
the user's sketch to generate a variety of novel and expressive 3D shapes, even
from abstract sketches. We demonstrate the effectiveness of our model compared
to the state-of-the-art using objective metric evaluation criteria and a
decisive user study, both indicating strong performance on sketches with a
medium level of abstraction. Furthermore, we showcase its intuitive
sketch-based shape editing capabilities.Comment: 18 pages, 18 figure
- …