6 research outputs found

    Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

    Full text link
    In this paper we focus on the problem of detecting ob-jects in 3D from RGB-D images. We propose a novel frame-work that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal lo-cation of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-the-art as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks. 1

    Dense Semantic Image Segmentation with Objects and Attributes

    Full text link
    The concepts of objects and attributes are both impor-tant for describing images precisely, since verbal descrip-tions often contain both adjectives and nouns (e.g. ‘I see a shiny red chair’). In this paper, we formulate the prob-lem of joint visual attribute and object class image seg-mentation as a dense multi-labelling problem, where each pixel in an image can be associated with both an object-class and a set of visual attributes labels. In order to learn the label correlations, we adopt a boosting-based piecewise training approach with respect to the visual appearance and co-occurrence cues. We use a filtering-based mean-field approximation approach for efficient joint inference. Further, we develop a hierarchical model to incorporate region-level object and attribute information. Experiments on the aPASCAL, CORE and attribute augmented NYU in-door scenes datasets show that the proposed approach is able to achieve state-of-the-art results. 1

    LARGE SCALE VISUAL RECOGNITION OF CLOTHING, PEOPLE AND STYLES

    Get PDF
    Clothing recognition is a societally and commercially important yet extremely challenging problem due to large variations in clothing appearance, layering, style, body shape and pose. In this dissertation, we propose new computational vision approaches that learn to represent and recognize clothing items in images. First, we present an effective method for parsing clothing in fashion photographs, where we label the regions of an image with their clothing categories. We then extend our approach to tackle the clothing parsing problem using a data-driven methodology: for a query image, we find similar styles from a large database of tagged fashion images and use these examples to recognize clothing items in the query. Along with our novel large fashion dataset, we also present intriguing initial results on using clothing estimates to improve human pose identification. Second, we examine questions related to fashion styles and identifying the clothing elements associated with each style. We first design an online competitive style rating game called Hipster Wars to crowd source reliable human judgments of clothing styles. We use this game to collect a new dataset of clothing outfits with associated style ratings for different clothing styles. Next, we build visual style descriptors and train models that are able to classify clothing styles and identify the clothing elements are most discriminative in every style. Finally, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same exact garment in an online shop. This is an extremely challenging task due to visual differences between street photos that are taken of people wearing clothing in everyday uncontrolled settings, and online shop photos, which are captured by professionals in highly controlled settings. We introduce a novel large dataset for this application, collected from the web, and present a deep learning based similarity network that can compare clothing items across visual domains.Doctor of Philosoph

    Unified Models for Recovering Semantics and Geometry from Scenes.

    Full text link
    Understanding contents of an image, or scene understanding, is an important yet very challenging problem in computer vision. In the last few years, substantially different approaches have been adopted for understanding 'things' (object categories that have a well defined shape such as people and cars), 'stuff' (object categories that have an amorphous spatial extent such as grass and sky), and the 'geometry' of scenes. In this thesis, we propose coherent models for the simultaneous recognition of 'things', 'stuff', and 'geometry'. The key contributions are i) to model their individual properties as well as relative properties, and ii) to propose a coherent framework that efficiently solves complicated tasks for scene understanding. We demonstrate that each task can be improved by also solving the other tasks in a joint fashion. The proposed models are capable of handling different types of inputs such as RGB, RGB-D, or hierarchically organized images. We have carried out extensive quantitative and qualitative experimental analysis to demonstrate the effectiveness of our theoretical findings and showed that our approaches yield competitive performances with respect to state-of-the-art methods.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107277/1/bsookim_1.pd
    corecore