3,617 research outputs found

    Free-hand Sketch Understanding and Analysis

    Get PDF
    PhDWith the proliferation of touch screens, sketching input has become popular among many software products. This phenomenon has stimulated a new round of boom in free-hand sketch research, covering topics like sketch recognition, sketch-based image retrieval, sketch synthesis and sketch segmentation. Comparing to previous sketch works, the newly proposed works are generally employing more complicated sketches and sketches in much larger quantity, thanks to the advancements in hardware. This thesis thus demonstrates some new works on free-hand sketches, presenting novel thoughts on aforementioned topics. On sketch recognition, Eitz et al. [32] are the first explorers, who proposed the large-scale TU-Berlin sketch dataset [32] that made sketch recognition possible. Following their work, we continue to analyze the dataset and find that the visual cue sparsity and internal structural complexity are the two biggest challenges for sketch recognition. Accordingly, we propose multiple kernel learning [45] to fuse multiple visual cues and star graph representation [12] to encode the structures of the sketches. With the new schemes, we have achieved significant improvement on recognition accuracy (from 56% to 65.81%). Experimental study on sketch attributes is performed to further boost sketch recognition performance and enable novel retrieval-by-attribute applications. For sketch-based image retrieval, we start by carefully examining the existing works. After looking at the big picture of sketch-based image retrieval, we highlight that studying the sketch’s ability to distinguish intra-category object variations should be the most promising direction to proceed on, and we define it as the fine-grained sketch-based image retrieval problem. Deformable part-based model which addresses object part details and object deformations is raised to tackle this new problem, and graph matching is employed to compute the similarity between deformable part-based models by matching the parts of different models. To evaluate this new problem, we combine the TU-Berlin sketch dataset and the PASCAL VOC photo dataset [36] to form a new challenging cross-domain dataset with pairwise sketch-photo similarity ratings, and our proposed method has shown promising results on this new dataset. Regarding sketch synthesis, we focus on the generating of real free-hand style sketches for general categories, as the closest previous work [8] only managed to show efficacy on a single category: human faces. The difficulties that impede sketch synthesis to reach other categories include the cluttered edges and diverse object variations due to deformation. To address those difficulties, we propose a deformable stroke model to form the sketch synthesis into a detection process, which is directly aiming at the cluttered background and the object variations. To alleviate the training of such a model, a perceptual grouping algorithm is further proposed that utilizes stroke length’s relationship to stroke semantics, stroke temporal order and Gestalt principles [58] to perform part-level sketch segmentation. The perceptual grouping provides semantic part-level supervision automatically for the deformable stroke model training, and an iterative learning scheme is introduced to gradually refine the supervision and the model training. With the learned deformable stroke models, sketches with distinct free-hand style can be generated for many categories

    Learning Active Basis Models by EM-Type Algorithms

    Full text link
    EM algorithm is a convenient tool for maximum likelihood model fitting when the data are incomplete or when there are latent variables or hidden states. In this review article we explain that EM algorithm is a natural computational scheme for learning image templates of object categories where the learning is not fully supervised. We represent an image template by an active basis model, which is a linear composition of a selected set of localized, elongated and oriented wavelet elements that are allowed to slightly perturb their locations and orientations to account for the deformations of object shapes. The model can be easily learned when the objects in the training images are of the same pose, and appear at the same location and scale. This is often called supervised learning. In the situation where the objects may appear at different unknown locations, orientations and scales in the training images, we have to incorporate the unknown locations, orientations and scales as latent variables into the image generation process, and learn the template by EM-type algorithms. The E-step imputes the unknown locations, orientations and scales based on the currently learned template. This step can be considered self-supervision, which involves using the current template to recognize the objects in the training images. The M-step then relearns the template based on the imputed locations, orientations and scales, and this is essentially the same as supervised learning. So the EM learning process iterates between recognition and supervised learning. We illustrate this scheme by several experiments.Comment: Published in at http://dx.doi.org/10.1214/09-STS281 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Deformable Part Models are Convolutional Neural Networks

    Full text link
    Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are "black-box" non-linear classifiers. In this paper, we show that a DPM can be formulated as a CNN, thus providing a novel synthesis of the two ideas. Our construction involves unrolling the DPM inference algorithm and mapping each step to an equivalent (and at times novel) CNN layer. From this perspective, it becomes natural to replace the standard image features used in DPM with a learned feature extractor. We call the resulting model DeepPyramid DPM and experimentally validate it on PASCAL VOC. DeepPyramid DPM significantly outperforms DPMs based on histograms of oriented gradients features (HOG) and slightly outperforms a comparable version of the recently introduced R-CNN detection system, while running an order of magnitude faster

    Hierarchical Object Parsing from Structured Noisy Point Clouds

    Full text link
    Object parsing and segmentation from point clouds are challenging tasks because the relevant data is available only as thin structures along object boundaries or other features, and is corrupted by large amounts of noise. To handle this kind of data, flexible shape models are desired that can accurately follow the object boundaries. Popular models such as Active Shape and Active Appearance models lack the necessary flexibility for this task, while recent approaches such as the Recursive Compositional Models make model simplifications in order to obtain computational guarantees. This paper investigates a hierarchical Bayesian model of shape and appearance in a generative setting. The input data is explained by an object parsing layer, which is a deformation of a hidden PCA shape model with Gaussian prior. The paper also introduces a novel efficient inference algorithm that uses informed data-driven proposals to initialize local searches for the hidden variables. Applied to the problem of object parsing from structured point clouds such as edge detection images, the proposed approach obtains state of the art parsing errors on two standard datasets without using any intensity information.Comment: 13 pages, 16 figure

    Free-hand sketch synthesis with deformable stroke models

    Get PDF
    We present a generative model which can automatically summarize the stroke composition of free-hand sketches of a given category. When our model is fit to a collection of sketches with similar poses, it discovers and learns the structure and appearance of a set of coherent parts, with each part represented by a group of strokes. It represents both consistent (topology) as well as diverse aspects (structure and appearance variations) of each sketch category. Key to the success of our model are important insights learned from a comprehensive study performed on human stroke data. By fitting this model to images, we are able to synthesize visually similar and pleasant free-hand sketches
    • …
    corecore