6 research outputs found

    Aplicació rica d'internet per a la consulta amb text i imatge a la CCMA

    Get PDF
    Premi millor projecte final de carrera d’Enginyeria de Telecomunicació en Serveis Telemàtics. Atorgat per Accenture (Curs 2009-2010)Award-winnin

    Aplicació rica d'internet per a la consulta amb text i imatge a la CCMA

    Get PDF
    Premi millor projecte final de carrera d’Enginyeria de Telecomunicació en Serveis Telemàtics. Atorgat per Accenture (Curs 2009-2010)Award-winnin

    Free-hand Sketch Understanding and Analysis

    Get PDF
    PhDWith the proliferation of touch screens, sketching input has become popular among many software products. This phenomenon has stimulated a new round of boom in free-hand sketch research, covering topics like sketch recognition, sketch-based image retrieval, sketch synthesis and sketch segmentation. Comparing to previous sketch works, the newly proposed works are generally employing more complicated sketches and sketches in much larger quantity, thanks to the advancements in hardware. This thesis thus demonstrates some new works on free-hand sketches, presenting novel thoughts on aforementioned topics. On sketch recognition, Eitz et al. [32] are the first explorers, who proposed the large-scale TU-Berlin sketch dataset [32] that made sketch recognition possible. Following their work, we continue to analyze the dataset and find that the visual cue sparsity and internal structural complexity are the two biggest challenges for sketch recognition. Accordingly, we propose multiple kernel learning [45] to fuse multiple visual cues and star graph representation [12] to encode the structures of the sketches. With the new schemes, we have achieved significant improvement on recognition accuracy (from 56% to 65.81%). Experimental study on sketch attributes is performed to further boost sketch recognition performance and enable novel retrieval-by-attribute applications. For sketch-based image retrieval, we start by carefully examining the existing works. After looking at the big picture of sketch-based image retrieval, we highlight that studying the sketch’s ability to distinguish intra-category object variations should be the most promising direction to proceed on, and we define it as the fine-grained sketch-based image retrieval problem. Deformable part-based model which addresses object part details and object deformations is raised to tackle this new problem, and graph matching is employed to compute the similarity between deformable part-based models by matching the parts of different models. To evaluate this new problem, we combine the TU-Berlin sketch dataset and the PASCAL VOC photo dataset [36] to form a new challenging cross-domain dataset with pairwise sketch-photo similarity ratings, and our proposed method has shown promising results on this new dataset. Regarding sketch synthesis, we focus on the generating of real free-hand style sketches for general categories, as the closest previous work [8] only managed to show efficacy on a single category: human faces. The difficulties that impede sketch synthesis to reach other categories include the cluttered edges and diverse object variations due to deformation. To address those difficulties, we propose a deformable stroke model to form the sketch synthesis into a detection process, which is directly aiming at the cluttered background and the object variations. To alleviate the training of such a model, a perceptual grouping algorithm is further proposed that utilizes stroke length’s relationship to stroke semantics, stroke temporal order and Gestalt principles [58] to perform part-level sketch segmentation. The perceptual grouping provides semantic part-level supervision automatically for the deformable stroke model training, and an iterative learning scheme is introduced to gradually refine the supervision and the model training. With the learned deformable stroke models, sketches with distinct free-hand style can be generated for many categories

    Delving Deep into Fine-Grained Sketch-Based Image Retrieval.

    Get PDF
    PhD ThesisTo see is to sketch. Since prehistoric times, people use sketch-like petroglyphs as an effective communicative tool which predates the appearance of language tens of thousands of years ago. This is even more true nowadays that with the ubiquitous proliferation of touchscreen devices, sketching is possibly the only rendering mechanism readily available for all to express visual intentions. The intriguing free-hand property of human sketches, however, becomes a major obstacle when practically applied – humans are not faithful artists, the sketches drawn are iconic abstractions of mental images and can quickly fall off the visual manifold of natural objects. When matching discriminatively with their corresponding photos, this problem is known as finegrained sketch-based image retrieval (FG-SBIR) and has drawn increasing interest due to its potential commercial adoption. This thesis delves deep into FG-SBIR by intuitively analysing the intrinsic unique traits of human sketches and make such understanding importantly leveraged to enhance their links to match with photos under deep learning. More specifically, this thesis investigates and has developed four methods for FG-SBIR as follows: Chapter 3 describes a discriminative-generative hybrid method to better bridge the domain gap between photo and sketch. Existing FG-SBIR models learn a deep joint embedding space with discriminative losses only to pull matching pairs of photos and sketches close and push mismatched pairs away, thus indirectly align the two domains. To this end, we introduce a i generative task of cross-domain image synthesis. Concretely when an input photo is embedded in the joint space, the embedding vector is used as input to a generative model to synthesise the corresponding sketch. This task enforces the learned embedding space to preserve all the domain invariant information that is useful for cross-domain reconstruction, thus explicitly reducing the domain gap as opposed to existing models. Such an approach achieves the first near-human performance on the largest FG-SBIR dataset to date, Sketchy. Chapter 4 presents a new way of modelling human sketch and shows how such modelling can be integrated into existing FG-SBIR paradigm with promising performance. Instead of modelling the forward sketching pass, we attempt to invert it. We model this inversion by translating iconic free-hand sketches to contours that resemble more geometrically realistic projections of object boundaries and separately factorise out the salient added details. This factorised rerepresentation makes it possible for more effective sketch-photo matching. Specifically, we propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding consistency constraint. A deep four-way Siamese model is then formulated to importantly utilise the synthesised contours by extracting distinct complementary detail features for FG-SBIR. Chapter 5 extends the practical applicability of FG-SBIR to work well beyond its training categories. Existing models, while successful, require instance-level pairing within each coarsegrained category as annotated training data, leaving their ability to deal with out-of-sample data unknown. We identify cross-category generalisation for FG-SBIR as a domain generalisation problem and propose the first solution. Our key contribution is a novel unsupervised learning approach to model a universal manifold of prototypical visual sketch traits. This manifold can then be used to paramaterise the learning of a sketch/photo representation. Model adaptation to novel categories then becomes automatic via embedding the novel sketch in the manifold and updating the representation and retrieval function accordingly. Chapter 6 challenges the ImageNet pre-training that has long been considered crucial by the FG-SBIR community due to the lack of large sketch-photo paired datasets for FG-SBIR training, and propose a self-supervised alternative for representation pre-training. Specifically, we consider the jigsaw puzzle game of recomposing images from shuffled parts. We identify two ii key facets of jigsaw task design that are required for effective performance. The first is formulating the puzzle in a mixed-modality fashion. Second we show that framing the optimisation as permutation matrix inference via Sinkhorn iterations is more effective than existing classifier instantiation of the Jigsaw idea. We show for the first time that ImageNet classification is unnecessary as a pre-training strategy for FG-SBIR and confirm the efficacy of our jigsaw approach
    corecore