1,654 research outputs found

    GRASS: Generative Recursive Autoencoders for Shape Structures

    Full text link
    We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures hierarchical structures of man-made 3D objects of varying structural complexities despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. Finally, our structure synthesis framework is augmented by a second trained module that produces fine-grained part geometry, conditioned on global and local structural context, leading to a full generative pipeline for 3D shapes. We demonstrate that without supervision, our network learns meaningful structural hierarchies adhering to perceptual grouping principles, produces compact codes which enable applications such as shape classification and partial matching, and supports shape synthesis and interpolation with significant variations in topology and geometry.Comment: Corresponding author: Kai Xu ([email protected]

    Topology-preserving perceptual segmentation using the Combinatorial Pyramid

    Get PDF
    Scene understanding and other high-level visual tasks usually rely on segmenting the captured images for dealing with a more efficient mid-level representation. Although this segmentation stage will consider topological constraints for the set of obtained regions (e.g., their internal connectivity), it is typical that the importance of preserving the topological relationships among regions will be not taken into account. Contrary to other similar approaches, this paper presents a bottom-up approach for perceptual segmentation of natural images which preserves the topology of the image. The segmentation algorithm consists of two consecutive stages: firstly, the input image is partitioned into a set of blobs of uniform colour (pre-segmentation stage) and then, using a more complex distance which integrates edge and region descriptors, these blobs are hierarchically merged (perceptual grouping). Both stages are addressed using the Combinatorial Pyramid, a hierarchical structure which can correctly encode relationships among image regions at upper levels. The performance of the proposed approach has been initially evaluated with respect to groundtruth segmentation data using the Berkeley Segmentation Dataset and Benchmark. Although additional descriptors must be added to deal with small regions and textured surfaces, experimental results reveal that the proposed perceptual grouping provides satisfactory scores

    Visual routines and attention

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (leaves 90-93).by Satyajit Rao.Ph.D

    Using contour information and segmentation for object registration, modeling and retrieval

    Get PDF
    This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios. There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections

    The PSEIKI Report—Version 2. Evidence Accumulation and Flow of Control in a Hierarchical Spatial Reasoning System

    Get PDF
    A fundamental goal of computer vision is the development of systems capable of carrying out scene interpretation while taking into account all the available knowledge. In this report, we have focused on how the interpretation task may be aided by expected-scene information which, in most cases, would not be in registration with the perceived scene. In this report, we describe PSEIKI, a framework for expectation-driven interpretation of image data. PSEIKI builds abstraction hierarchies in image data using, for cues, supplied abstraction hierarchies in a scene expectation map. Hypothesized abstractions in the image data are geometrically compared with the known abstractions in the expected scene; the metrics used for these comparisons translate into belief values. The Dempster-Shafer formalism is used to accumulate beliefs for the synthesized abstractions in the image data. For accumulating belief values, a computationally efficient variation of Dempster’s rule of combination is developed to enable the system to deal with the overwhelming amount of information present in most images. This variation of Dempster’s rule allows the reasoning process to be embedded into the abstraction hierarchy by allowing for the propagation of belief values between elements at different levels of abstraction. The system has been implemented as a 2- panel, 5-level blackboard in OPS 83. This report also discusses the control aspects of the blackboard, achieved via a distributed monitor using the OPS83 demons and a scheduler. Various knowledge sources for forming groupings in the image data and for labeling such groupings with abstractions from the scene expectation map are also discussed

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling
    corecore