677 research outputs found

    Deep Contrast Learning for Salient Object Detection

    Get PDF
    Salient object detection has recently witnessed substantial progress due to powerful features extracted using deep convolutional neural networks (CNNs). However, existing CNN-based methods operate at the patch level instead of the pixel level. Resulting saliency maps are typically blurry, especially near the boundary of salient objects. Furthermore, image patches are treated as independent samples even when they are overlapping, giving rise to significant redundancy in computation and storage. In this CVPR 2016 paper, we propose an end-to-end deep contrast network to overcome the aforementioned limitations. Our deep network consists of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream. The first stream directly produces a saliency map with pixel-level accuracy from an input image. The second stream extracts segment-wise features very efficiently, and better models saliency discontinuities along object boundaries. Finally, a fully connected CRF model can be optionally incorporated to improve spatial coherence and contour localization in the fused result from these two streams. Experimental results demonstrate that our deep model significantly improves the state of the art.Comment: To appear in CVPR 201

    Learning Regularization Weight for CRF Optimization

    Get PDF
    In recent years, convolutional neural networks (CNNs) are leading the way in many computer vision problems. Since the development of fully convolutional networks, CNNs have been widely employed for low-level pixel-labeling problems, and successfully pushed the performance to a new level. Although CNNs are able to extract highly discriminative features, they typically assign a class label to each image pixel individually. This leads to various spatial inconsistencies. Therefore, CNNs are commonly combined with graphical models, such as conditional random fields (CRFs), to impose spatial coherence. CRFs were invented precisely for the task of imposing spatial coherence among image pixels. The coherence regularization weight serves an important role of controlling the regularization strength in the CRF optimization, and has a great influence on the quality of the final result. Traditionally this weight value is set to a fixed number for all images. In this thesis, we propose a novel approach to learn the coherence regularization weight for each individual image using a CNN, and then apply this per-image-learned weight in the CNN+CRF system. We first construct a dataset where the optimal regularization weight for the CRF optimization has been pre-computed for each image. We adopt convolutional regression networks with standard architecture for learning, and tailor the input according to our problem. We test the effectiveness of our approach on the task of salient object segmentation where a graph-cut based CRF optimizer can generate globally optimal solution. We show that consistent performance improvements can be achieved by using the regularization weight learned on per-image basis as opposed to a fixed regularization weight for all images in the dataset

    PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection

    Full text link
    Contexts play an important role in the saliency detection task. However, given a context region, not all contextual information is helpful for the final task. In this paper, we propose a novel pixel-wise contextual attention network, i.e., the PiCANet, to learn to selectively attend to informative context locations for each pixel. Specifically, for each pixel, it can generate an attention map in which each attention weight corresponds to the contextual relevance at each context location. An attended contextual feature can then be constructed by selectively aggregating the contextual information. We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively. Both models are fully differentiable and can be embedded into CNNs for joint training. We also incorporate the proposed models with the U-Net architecture to detect salient objects. Extensive experiments show that the proposed PiCANets can consistently improve saliency detection performance. The global and local PiCANets facilitate learning global contrast and homogeneousness, respectively. As a result, our saliency model can detect salient objects more accurately and uniformly, thus performing favorably against the state-of-the-art methods

    Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation

    Full text link
    We propose an approach to discover class-specific pixels for the weakly-supervised semantic segmentation task. We show that properly combining saliency and attention maps allows us to obtain reliable cues capable of significantly boosting the performance. First, we propose a simple yet powerful hierarchical approach to discover the class-agnostic salient regions, obtained using a salient object detector, which otherwise would be ignored. Second, we use fully convolutional attention maps to reliably localize the class-specific regions in a given image. We combine these two cues to discover class-specific pixels which are then used as an approximate ground truth for training a CNN. While solving the weakly supervised semantic segmentation task, we ensure that the image-level classification task is also solved in order to enforce the CNN to assign at least one pixel to each object present in the image. Experimentally, on the PASCAL VOC12 val and test sets, we obtain the mIoU of 60.8% and 61.9%, achieving the performance gains of 5.1% and 5.2% compared to the published state-of-the-art results. The code is made publicly available

    Superpixel Convolutional Networks using Bilateral Inceptions

    Full text link
    In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new 'bilateral inception' module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module are learned end-to-end using standard backpropagation techniques. The bilateral inception module addresses two issues that arise with general CNN segmentation architectures. First, this module propagates information between (super) pixels while respecting image edges, thus using the structured information of the problem for improved results. Second, the layer recovers a full resolution segmentation result from the lower resolution solution of a CNN. In the experiments, we modify several existing CNN architectures by inserting our inception module between the last CNN (1x1 convolution) layers. Empirical results on three different datasets show reliable improvements not only in comparison to the baseline networks, but also in comparison to several dense-pixel prediction techniques such as CRFs, while being competitive in time.Comment: European Conference on Computer Vision (ECCV), 201
    corecore