474 research outputs found

    Analysis and Manipulation of Repetitive Structures of Varying Shape

    Get PDF
    Self-similarity and repetitions are ubiquitous in man-made and natural objects. Such structural regularities often relate to form, function, aesthetics, and design considerations. Discovering structural redundancies along with their dominant variations from 3D geometry not only allows us to better understand the underlying objects, but is also beneficial for several geometry processing tasks including compact representation, shape completion, and intuitive shape manipulation. To identify these repetitions, we present a novel detection algorithm based on analyzing a graph of surface features. We combine general feature detection schemes with a RANSAC-based randomized subgraph searching algorithm in order to reliably detect recurring patterns of locally unique structures. A subsequent segmentation step based on a simultaneous region growing is applied to verify that the actual data supports the patterns detected in the feature graphs. We introduce our graph based detection algorithm on the example of rigid repetitive structure detection. Then we extend the approach to allow more general deformations between the detected parts. We introduce subspace symmetries whereby we characterize similarity by requiring the set of repeating structures to form a low dimensional shape space. We discover these structures based on detecting linearly correlated correspondences among graphs of invariant features. The found symmetries along with the modeled variations are useful for a variety of applications including non-local and non-rigid denoising. Employing subspace symmetries for shape editing, we introduce a morphable part model for smart shape manipulation. The input geometry is converted to an assembly of deformable parts with appropriate boundary conditions. Our method uses self-similarities from a single model or corresponding parts of shape collections as training input and allows the user also to reassemble the identified parts in new configurations, thus exploiting both the discrete and continuous learned variations while ensuring appropriate boundary conditions across part boundaries. We obtain an interactive yet intuitive shape deformation framework producing realistic deformations on classes of objects that are difficult to edit using repetition-unaware deformation techniques

    How was your day? Online visual workspace summaries using incremental clustering in topic space

    Get PDF
    Someday mobile robots will operate continually. Day after day, they will be in receipt of a never ending stream of images. In anticipation of this, this paper is about having a mobile robot generate apt and compact summaries of its life experience. We consider a robot moving around its environment both revisiting and exploring, accruing images as it goes. We describe how we can choose a subset of images to summarise the robot's cumulative visual experience. Moreover we show how to do this such that the time cost of generating an summary is largely independent of the total number of images processed. No one day is harder to summarise than any other.Micro Autonomous Consortium Systems and Technology (United States. Army Research Laboratory (Grant W911NF-08-2-0004))United States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N00014-09-1-1051)United States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N00014-09-1-1031

    VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS

    Get PDF
    This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models

    Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos

    Full text link
    This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot. A key feature of the self-supervised training process is a graph-matching algorithm that operates on the over-segmentation output of the point cloud that is reconstructed from each video. The graph matching, along with point cloud registration, is able to find reoccurring object patterns across videos and combine them into 3D object pseudo labels, even under occlusions or different viewing angles. Projected 2D object masks from 3D pseudo labels are used to train a pixel-wise feature extractor through contrastive learning. During online inference, a clustering method uses the learned features to cluster foreground pixels into object segments. Experiments highlight the method's effectiveness on both real and synthetic video datasets, which include cluttered scenes of tabletop objects. The proposed method outperforms existing unsupervised methods for object segmentation by a large margin

    Computational models for image contour grouping

    Get PDF
    Contours are one dimensional curves which may correspond to meaningful entities such as object boundaries. Accurate contour detection will simplify many vision tasks such as object detection and image recognition. Due to the large variety of image content and contour topology, contours are often detected as edge fragments at first, followed by a second step known as {u0300}{u0300}contour grouping'' to connect them. Due to ambiguities in local image patches, contour grouping is essential for constructing globally coherent contour representation. This thesis aims to group contours so that they are consistent with human perception. We draw inspirations from Gestalt principles, which describe perceptual grouping ability of human vision system. In particular, our work is most relevant to the principles of closure, similarity, and past experiences. The first part of our contribution is a new computational model for contour closure. Most of existing contour grouping methods have focused on pixel-wise detection accuracy and ignored the psychological evidences for topological correctness. This chapter proposes a higher-order CRF model to achieve contour closure in the contour domain. We also propose an efficient inference method which is guaranteed to find integer solutions. Tested on the BSDS benchmark, our method achieves a superior contour grouping performance, comparable precision-recall curves, and more visually pleasant results. Our work makes progresses towards a better computational model of human perceptual grouping. The second part is an energy minimization framework for salient contour detection problem. Region cues such as color/texture homogeneity, and contour cues such as local contrast, are both useful for this task. In order to capture both kinds of cues in a joint energy function, topological consistency between both region and contour labels must be satisfied. Our technique makes use of the topological concept of winding numbers. By using a fast method for winding number computation, we find that a small number of linear constraints are sufficient for label consistency. Our method is instantiated by ratio-based energy functions. Due to cue integration, our method obtains improved results. User interaction can also be incorporated to further improve the results. The third part of our contribution is an efficient category-level image contour detector. The objective is to detect contours which most likely belong to a prescribed category. Our method, which is based on three levels of shape representation and non-parametric Bayesian learning, shows flexibility in learning from either human labeled edge images or unlabelled raw images. In both cases, our experiments obtain better contour detection results than competing methods. In addition, our training process is robust even with a considerable size of training samples. In contrast, state-of-the-art methods require more training samples, and often human interventions are required for new category training. Last but not least, in Chapter 7 we also show how to leverage contour information for symmetry detection. Our method is simple yet effective for detecting the symmetric axes of bilaterally symmetric objects in unsegmented natural scene images. Compared with methods based on feature points, our model can often produce better results for the images containing limited texture

    Shape Representations Using Nested Descriptors

    Get PDF
    The problem of shape representation is a core problem in computer vision. It can be argued that shape representation is the most central representational problem for computer vision, since unlike texture or color, shape alone can be used for perceptual tasks such as image matching, object detection and object categorization. This dissertation introduces a new shape representation called the nested descriptor. A nested descriptor represents shape both globally and locally by pooling salient scaled and oriented complex gradients in a large nested support set. We show that this nesting property introduces a nested correlation structure that enables a new local distance function called the nesting distance, which provides a provably robust similarity function for image matching. Furthermore, the nesting property suggests an elegant flower like normalization strategy called a log-spiral difference. We show that this normalization enables a compact binary representation and is equivalent to a form a bottom up saliency. This suggests that the nested descriptor representational power is due to representing salient edges, which makes a fundamental connection between the saliency and local feature descriptor literature. In this dissertation, we introduce three examples of shape representation using nested descriptors: nested shape descriptors for imagery, nested motion descriptors for video and nested pooling for activities. We show evaluation results for these representations that demonstrate state-of-the-art performance for image matching, wide baseline stereo and activity recognition tasks
    • …
    corecore