6 research outputs found
Aplicació rica d'internet per a la consulta amb text i imatge a la CCMA
Premi millor projecte final de carrera d’Enginyeria de Telecomunicació en Serveis Telemà tics. Atorgat per Accenture (Curs 2009-2010)Award-winnin
Aplicació rica d'internet per a la consulta amb text i imatge a la CCMA
Premi millor projecte final de carrera d’Enginyeria de Telecomunicació en Serveis Telemà tics. Atorgat per Accenture (Curs 2009-2010)Award-winnin
Free-hand Sketch Understanding and Analysis
PhDWith the proliferation of touch screens, sketching input has become popular among many software
products. This phenomenon has stimulated a new round of boom in free-hand sketch research,
covering topics like sketch recognition, sketch-based image retrieval, sketch synthesis
and sketch segmentation. Comparing to previous sketch works, the newly proposed works are
generally employing more complicated sketches and sketches in much larger quantity, thanks
to the advancements in hardware. This thesis thus demonstrates some new works on free-hand
sketches, presenting novel thoughts on aforementioned topics.
On sketch recognition, Eitz et al. [32] are the first explorers, who proposed the large-scale
TU-Berlin sketch dataset [32] that made sketch recognition possible. Following their work, we
continue to analyze the dataset and find that the visual cue sparsity and internal structural complexity
are the two biggest challenges for sketch recognition. Accordingly, we propose multiple
kernel learning [45] to fuse multiple visual cues and star graph representation [12] to encode the
structures of the sketches. With the new schemes, we have achieved significant improvement
on recognition accuracy (from 56% to 65.81%). Experimental study on sketch attributes is performed
to further boost sketch recognition performance and enable novel retrieval-by-attribute
applications.
For sketch-based image retrieval, we start by carefully examining the existing works. After
looking at the big picture of sketch-based image retrieval, we highlight that studying the sketch’s
ability to distinguish intra-category object variations should be the most promising direction to
proceed on, and we define it as the fine-grained sketch-based image retrieval problem. Deformable
part-based model which addresses object part details and object deformations is raised
to tackle this new problem, and graph matching is employed to compute the similarity between
deformable part-based models by matching the parts of different models. To evaluate this new
problem, we combine the TU-Berlin sketch dataset and the PASCAL VOC photo dataset [36] to
form a new challenging cross-domain dataset with pairwise sketch-photo similarity ratings, and
our proposed method has shown promising results on this new dataset. Regarding sketch synthesis, we focus on the generating of real free-hand style sketches for
general categories, as the closest previous work [8] only managed to show efficacy on a single
category: human faces. The difficulties that impede sketch synthesis to reach other categories
include the cluttered edges and diverse object variations due to deformation. To address those
difficulties, we propose a deformable stroke model to form the sketch synthesis into a detection
process, which is directly aiming at the cluttered background and the object variations. To alleviate
the training of such a model, a perceptual grouping algorithm is further proposed that
utilizes stroke length’s relationship to stroke semantics, stroke temporal order and Gestalt principles
[58] to perform part-level sketch segmentation. The perceptual grouping provides semantic
part-level supervision automatically for the deformable stroke model training, and an iterative
learning scheme is introduced to gradually refine the supervision and the model training. With
the learned deformable stroke models, sketches with distinct free-hand style can be generated for
many categories
Delving Deep into Fine-Grained Sketch-Based Image Retrieval.
PhD ThesisTo see is to sketch. Since prehistoric times, people use sketch-like petroglyphs as an effective
communicative tool which predates the appearance of language tens of thousands of years ago.
This is even more true nowadays that with the ubiquitous proliferation of touchscreen devices,
sketching is possibly the only rendering mechanism readily available for all to express visual
intentions. The intriguing free-hand property of human sketches, however, becomes a major
obstacle when practically applied – humans are not faithful artists, the sketches drawn are iconic
abstractions of mental images and can quickly fall off the visual manifold of natural objects.
When matching discriminatively with their corresponding photos, this problem is known as finegrained
sketch-based image retrieval (FG-SBIR) and has drawn increasing interest due to its
potential commercial adoption. This thesis delves deep into FG-SBIR by intuitively analysing
the intrinsic unique traits of human sketches and make such understanding importantly leveraged
to enhance their links to match with photos under deep learning. More specifically, this thesis
investigates and has developed four methods for FG-SBIR as follows:
Chapter 3 describes a discriminative-generative hybrid method to better bridge the domain
gap between photo and sketch. Existing FG-SBIR models learn a deep joint embedding space
with discriminative losses only to pull matching pairs of photos and sketches close and push
mismatched pairs away, thus indirectly align the two domains. To this end, we introduce a
i
generative task of cross-domain image synthesis. Concretely when an input photo is embedded
in the joint space, the embedding vector is used as input to a generative model to synthesise the
corresponding sketch. This task enforces the learned embedding space to preserve all the domain
invariant information that is useful for cross-domain reconstruction, thus explicitly reducing the
domain gap as opposed to existing models. Such an approach achieves the first near-human
performance on the largest FG-SBIR dataset to date, Sketchy.
Chapter 4 presents a new way of modelling human sketch and shows how such modelling can
be integrated into existing FG-SBIR paradigm with promising performance. Instead of modelling
the forward sketching pass, we attempt to invert it. We model this inversion by translating
iconic free-hand sketches to contours that resemble more geometrically realistic projections
of object boundaries and separately factorise out the salient added details. This factorised rerepresentation
makes it possible for more effective sketch-photo matching. Specifically, we
propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding
consistency constraint. A deep four-way Siamese model is then formulated to importantly utilise
the synthesised contours by extracting distinct complementary detail features for FG-SBIR.
Chapter 5 extends the practical applicability of FG-SBIR to work well beyond its training
categories. Existing models, while successful, require instance-level pairing within each coarsegrained
category as annotated training data, leaving their ability to deal with out-of-sample data
unknown. We identify cross-category generalisation for FG-SBIR as a domain generalisation
problem and propose the first solution. Our key contribution is a novel unsupervised learning
approach to model a universal manifold of prototypical visual sketch traits. This manifold can
then be used to paramaterise the learning of a sketch/photo representation. Model adaptation to
novel categories then becomes automatic via embedding the novel sketch in the manifold and
updating the representation and retrieval function accordingly.
Chapter 6 challenges the ImageNet pre-training that has long been considered crucial by the
FG-SBIR community due to the lack of large sketch-photo paired datasets for FG-SBIR training,
and propose a self-supervised alternative for representation pre-training. Specifically, we
consider the jigsaw puzzle game of recomposing images from shuffled parts. We identify two
ii
key facets of jigsaw task design that are required for effective performance. The first is formulating
the puzzle in a mixed-modality fashion. Second we show that framing the optimisation
as permutation matrix inference via Sinkhorn iterations is more effective than existing classifier
instantiation of the Jigsaw idea. We show for the first time that ImageNet classification is unnecessary
as a pre-training strategy for FG-SBIR and confirm the efficacy of our jigsaw approach