19 research outputs found

    Seeing Behind Things: Extending Semantic Segmentation to Occluded Regions

    Full text link
    Semantic segmentation and instance level segmentation made substantial progress in recent years due to the emergence of deep neural networks (DNNs). A number of deep architectures with Convolution Neural Networks (CNNs) were proposed that surpass the traditional machine learning approaches for segmentation by a large margin. These architectures predict the directly observable semantic category of each pixel by usually optimizing a cross entropy loss. In this work we push the limit of semantic segmentation towards predicting semantic labels of directly visible as well as occluded objects or objects parts, where the network's input is a single depth image. We group the semantic categories into one background and multiple foreground object groups, and we propose a modification of the standard cross-entropy loss to cope with the settings. In our experiments we demonstrate that a CNN trained by minimizing the proposed loss is able to predict semantic categories for visible and occluded object parts without requiring to increase the network size (compared to a standard segmentation task). The results are validated on a newly generated dataset (augmented from SUNCG) dataset

    SLIC Superpixels Compared to State-of-the-art Superpixel Methods

    Get PDF
    Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation

    Efficient inference for fully-connected CRFs with stationarity

    Full text link
    The Conditional Random Field (CRF) is a popular tool for object-based image segmentation. CRFs used in prac-tice typically have edges only between adjacent image pix-els. To represent object relationship statistics beyond adja-cent pixels, prior work either represents only weak spatial information using the segmented regions, or encodes only global object co-occurrences. In this paper, we propose a unified model that augments the pixel-wise CRFs to cap-ture object spatial relationships. To this end, we use a fully connected CRF, which has an edge for each pair of pixels. The edge potentials are defined to capture the spatial in-formation and preserve the object boundaries at the same time. Traditional inference methods, such as belief propa-gation and graph cuts, are impractical in such a case where billions of edges are defined. Under only one assumption that the spatial relationships among different objects only depend on their relative positions (spatially stationary), we develop an efficient inference algorithm that converges in a few seconds on a standard resolution image, where belief propagation takes more than one hour for a single iteration. 1

    Propagated image Segmentation Using Edge-Weighted Centroidal Voronoi Tessellation based Methods

    Get PDF
    Propagated image segmentation is the problem of utilizing the existing segmentation of an image for obtaining a new segmentation of, either a neighboring image in a sequence, or the same image but in different scales. We name these two cases as the inter-image propagation and the intra-image propagation respectively. The inter-image propagation is particularly important to material science, where efficient and accurate segmentation of a sequence of 2D serial-sectioned images of 3D material samples is an essential step to understand the underlying micro-structure and related physical properties. For natural images with objects in different scales, the intra-image propagation, where segmentations are propagated from the finest scale to coarser scales, is able to better capture object boundaries than single-shot segmentations on a fixed image scale. In this work, we first propose an inter-image propagation method named Edge- Weighted Centroid Voronoi Tessellation with Propagation of Consistency Constraint (CCEWCVT) to effectively segment material images. CCEWCVT segments an image sequence by repeatedly propagating a 2D segmentation from one slice to another, and in each step of this propagation, we apply the proposed consistency constraint in the pixel clustering process such that stable structures identified from the previous slice can be well-preserved. We further propose a non-rigid transformation based association method to find the correspondence of propagated stable structures in the next slice when the inter-image distance becomes large. We justify the effectiveness of the proposed CCEWCVT method on 3D material image sequences, and we compare its performance against several state-of-the-art 2D, 3D, propagated segmentation methods. Then for the intra-image propagation, we propose a superpixel construction method named Hierarchical Edge-Weighted Centroidal Voronoi Tessellation (HEWCVT) to accurately capture object boundaries in natural images. We model the problem as a multilevel clustering process: superpixels in one level are clustered to obtain larger size superpixels in the next level. The clustering energy involves both color similarities and the proposed boundary smoothness of superpixels. We further extend HEWCVT to obtain supervoxels on 3D images or videos. Both quantitative and qualitative evaluation results on several standard datasets show that the proposed HEWCVT method achieves superior or comparable performances to other state-of-the-art methods. v

    Nonparametric Scene Parsing via Label Transfer

    Full text link
    corecore