6 research outputs found

    Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection

    Full text link
    Top-down saliency models produce a probability map that peaks at target locations specified by a task/goal such as object detection. They are usually trained in a fully supervised setting involving pixel-level annotations of objects. We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image. First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate feature saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. We evaluate the performance of the proposed weakly supervised topdown saliency and achieve comparable performance with fully supervised approaches. Experiments are carried out on seven challenging datasets and quantitative results are compared with 40 closely related approaches across 4 different applications.Comment: 14 pages, 7 figure

    Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

    Full text link

    Discriminative Dictionary Learning With Spatial Constraints

    Get PDF
    In this thesis, we investigate the use of dictionary learning for discriminative tasks on natural images. Our contributions can be summarized as follows: • We introduce discriminative deviation based learning to achieve principled handling of the reconstruction-discrimination tradeoff that is inherent to discriminative dictionary learning. • Since natural images obey a strong smoothness prior, we show how spatial smoothness constraints can be incorporated into the learning formulation by embedding dictionary learning into Conditional Random Field (CRF) learning. We demonstrate that such smoothness constraints can lead to state-of-the-art performance for pixel-classification tasks. • Finally, we lay down the foundations of super-latent learning. By treating sparse codes on a CRF as latent variables, dictionary learning can also be performed via the Latent (Structural) SVM formulation for jointly learning a classifier over the sparse codes. The dictionary is treated as a super-latent variable that generates the latent variables

    LARGE SCALE VISUAL RECOGNITION OF CLOTHING, PEOPLE AND STYLES

    Get PDF
    Clothing recognition is a societally and commercially important yet extremely challenging problem due to large variations in clothing appearance, layering, style, body shape and pose. In this dissertation, we propose new computational vision approaches that learn to represent and recognize clothing items in images. First, we present an effective method for parsing clothing in fashion photographs, where we label the regions of an image with their clothing categories. We then extend our approach to tackle the clothing parsing problem using a data-driven methodology: for a query image, we find similar styles from a large database of tagged fashion images and use these examples to recognize clothing items in the query. Along with our novel large fashion dataset, we also present intriguing initial results on using clothing estimates to improve human pose identification. Second, we examine questions related to fashion styles and identifying the clothing elements associated with each style. We first design an online competitive style rating game called Hipster Wars to crowd source reliable human judgments of clothing styles. We use this game to collect a new dataset of clothing outfits with associated style ratings for different clothing styles. Next, we build visual style descriptors and train models that are able to classify clothing styles and identify the clothing elements are most discriminative in every style. Finally, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same exact garment in an online shop. This is an extremely challenging task due to visual differences between street photos that are taken of people wearing clothing in everyday uncontrolled settings, and online shop photos, which are captured by professionals in highly controlled settings. We introduce a novel large dataset for this application, collected from the web, and present a deep learning based similarity network that can compare clothing items across visual domains.Doctor of Philosoph

    Accurate Object Recognition with Shape Masks

    No full text
    International audienceIn this paper we propose an object recognition approach that is based on shape masks--generalizations of segmentation masks. As shape masks carry information about the extent (outline) of objects, they provide a convenient tool to exploit the geometry of objects. We apply our ideas to two common object class recognition tasks--classification and localization. For classification, we extend the orderless bag-of-features image representation. In the proposed setup shape masks can be seen as weak geometrical constraints over bag-of-features. Those constraints can be used to reduce background clutter and help recognition. For localization, we propose a new recognition scheme based on high-dimensional hypothesis clustering. Shape masks allow to go beyond bounding boxes and determine the outline (approximate segmentation) of the object during localization. Furthermore, the method easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. Our experiments reveal that shape masks can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset
    corecore