8 research outputs found

    Hierarchical Scene Annotation

    Get PDF
    We present a computer-assisted annotation system, together with a labeled dataset and benchmark suite, for evaluating an algorithm’s ability to recover hierarchical scene structure. We evolve segmentation groundtruth from the two-dimensional image partition into a tree model that captures both occlusion and object-part relationships among possibly overlapping regions. Our tree model extends the segmentation problem to encompass object detection, object-part containment, and figure-ground ordering. We mitigate the cost of providing richer groundtruth labeling through a new web-based annotation tool with an intuitive graphical interface for rearranging the region hierarchy. Using precomputed superpixels, our tool also guides creation of user-specified regions with pixel-perfect boundaries. Widespread adoption of this human-machine combination should make the inaccuracies of bounding box labeling a relic of the past. Evaluating the state-of-the-art in fully automatic image segmentation reveals that it produces accurate two-dimension partitions, but does not respect groundtruth object-part structure. Our dataset and benchmark is the first to quantify these inadequacies. We illuminate recovery of rich scene structure as an important new goal for segmentation

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    Multiscale combinatorial grouping for image segmentation and object proposal generation

    Get PDF
    We propose a unified approach for bottom-up hierarchical image segmentation and object proposal generation for recognition, called Multiscale Combinatorial Grouping (MCG). For this purpose, we first develop a fast normalized cuts algorithm. We then propose a high-performance hierarchical segmenter that makes effective use of multiscale information. Finally, we propose a grouping strategy that combines our multiscale regions into highly-accurate object proposals by exploring efficiently their combinatorial space. We also present Single-scale Combinatorial Grouping (SCG), a faster version of MCG that produces competitive proposals in under five seconds per image. We conduct an extensive and comprehensive empirical validation on the BSDS500, SegVOC12, SBD, and COCO datasets, showing that MCG produces state-of-the-art contours, hierarchical regions, and object proposals.Peer ReviewedPostprint (author's final draft

    Semantic amodal video segmentation using a synthetic dataset

    Get PDF
    In this work, we provide tools for annotating both object category and shot transitions for a new semantic modal instance-level object segmentation dataset. This new dataset provides ample opportunities to train models for instance-level segmentation, both modal and amodal. Moreover, in this work, we also present results for instance-level segmentation using ResNet-based DeepLab, a state-of-the-art semantic image segmentation model. We also develop a new semantic amodal instance-level video segmentation model based on DeepLab for the aforementioned dataset. Our model for amodal segmentation operates on a per-frame basis, and the model is guided by the modal mask estimated from the current frame and from previous frames delineating the object of interest. We demonstrate the efficacy of the proposed model on the new dataset

    Multiscale Centerline Detection

    Get PDF
    Finding the centerline and estimating the radius of linear structures is a critical first step in many applications, ranging from road delineation in 2D aerial images to modeling blood vessels, lung bronchi, and dendritic arbors in 3D biomedical image stacks. Existing techniques rely either on filters designed to respond to ideal cylindrical structures or on classification techniques. The former tend to become unreliable when the linear structures are very irregular while the latter often has difficulties distinguishing centerline locations from neighboring ones, thus losing accuracy. We solve this problem by reformulating centerline detection in terms of a \emph{regression} problem. We first train regressors to return the distances to the closest centerline in scale-space, and we apply them to the input images or volumes. The centerlines and the corresponding scale then correspond to the regressors local maxima, which can be easily identified. We show that our method outperforms state-of-the-art techniques for various 2D and 3D datasets. Moreover, our approach is very generic and also performs well on contour detection. We show an improvement above recent contour detection algorithms on the BSDS500 dataset

    Multiscale Centerline Extraction Based on Regression and Projection onto the Set of Elongated Structures

    Get PDF
    Automatically extracting linear structures from images is a fundamental low-level vision problem with numerous applications in different domains. Centerline detection and radial estimation are the first crucial steps in most Computer Vision pipelines aiming to reconstruct linear structures. Existing techniques rely either on hand-crafted filters, designed to respond to ideal profiles of the linear structure, or on classification-based approaches, which automatically learn to detect centerline points from data. Hand-crafted methods are the most accurate when the content of the image fulfills the ideal model they rely on. However, they lose accuracy in the presence of noise or when the linear structures are irregular and deviate from the ideal case. Machine learning techniques can alleviate this problem. However, they are mainly based on a classification framework. In this thesis, we show that classification is not the best formalism to solve the centerline detection problem. In fact, since the appearance of a centerline point is very similar to the points immediately next to it, the output of a classifier trained to detect centerlines presents low localization accuracy and double responses on the body of the linear structure. To solve this problem, we propose a regression-based formulation for centerline detection. We rely on the distance transform of the centerlines to automatically learn a function whose local maxima correspond to centerline points. The output of our method can be used to directly estimate the location of the centerline, by a simple Non-Maximum Suppression operation, or it can be used as input to a tracing pipeline to reconstruct the graph of the linear structure. In both cases, our method gives more accurate results than state-of-the-art techniques on challenging 2D and 3D datasets. Our method relies on features extracted by means of convolutional filters. In order to process large amount of data efficiently, we introduce a general filter bank approximation scheme. In particular, we show that a generic filter bank can be approximated by a linear combination of a smaller set of separable filters. Thanks to this method, we can greatly reduce the computation time of the convolutions, without loss of accuracy. Our approach is general, and we demonstrate its effectiveness by applying it to different Computer Vision problems, such as linear structure detection and image classification with Convolutional Neural Networks. We further improve our regression-based method for centerline detection by taking advantage of contextual image information. We adopt a multiscale iterative regression approach to efficiently include a large image context in our algorithm. Compared to previous approaches, we use context both in the spatial domain and in the radial one. In this way, our method is also able to return an accurate estimation of the radii of the linear structures. The idea of using regression can also be beneficial for solving other related Computer Vision problems. For example, we show an improvement compared to previous works when applying it to boundary and membrane detection. Finally, we focus on the particular geometric properties of the linear structures. We observe that most methods for detecting them treat each pixel independently and do not model the strong relation that exists between neighboring pixels. As a consequence, their output is geometrically inconsistent. In this thesis, we address this problem by considering the projection of the score map returned by our regressor onto the set of all geometrically admissible ground truth images. We propose an efficient patch-wise approximation scheme to compute the projection. Moreover, we provide conditions under which the projection is exact. We demonstrate the advantage of our method by applying it to four different problems