1,180 research outputs found

    Efficient SDP Inference for Fully-connected CRFs Based on Low-rank Decomposition

    Full text link
    Conditional Random Fields (CRF) have been widely used in a variety of computer vision tasks. Conventional CRFs typically define edges on neighboring image pixels, resulting in a sparse graph such that efficient inference can be performed. However, these CRFs fail to model long-range contextual relationships. Fully-connected CRFs have thus been proposed. While there are efficient approximate inference methods for such CRFs, usually they are sensitive to initialization and make strong assumptions. In this work, we develop an efficient, yet general algorithm for inference on fully-connected CRFs. The algorithm is based on a scalable SDP algorithm and the low- rank approximation of the similarity/kernel matrix. The core of the proposed algorithm is a tailored quasi-Newton method that takes advantage of the low-rank matrix approximation when solving the specialized SDP dual problem. Experiments demonstrate that our method can be applied on fully-connected CRFs that cannot be solved previously, such as pixel-level image co-segmentation.Comment: 15 pages. A conference version of this work appears in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 201

    Discriminative Training of Deep Fully-connected Continuous CRF with Task-specific Loss

    Full text link
    Recent works on deep conditional random fields (CRF) have set new records on many vision tasks involving structured predictions. Here we propose a fully-connected deep continuous CRF model for both discrete and continuous labelling problems. We exemplify the usefulness of the proposed model on multi-class semantic labelling (discrete) and the robust depth estimation (continuous) problems. In our framework, we model both the unary and the pairwise potential functions as deep convolutional neural networks (CNN), which are jointly learned in an end-to-end fashion. The proposed method possesses the main advantage of continuously-valued CRF, which is a closed-form solution for the Maximum a posteriori (MAP) inference. To better adapt to different tasks, instead of using the commonly employed maximum likelihood CRF parameter learning protocol, we propose task-specific loss functions for learning the CRF parameters. It enables direct optimization of the quality of the MAP estimates during the course of learning. Specifically, we optimize the multi-class classification loss for the semantic labelling task and the Turkey's biweight loss for the robust depth estimation problem. Experimental results on the semantic labelling and robust depth estimation tasks demonstrate that the proposed method compare favorably against both baseline and state-of-the-art methods. In particular, we show that although the proposed deep CRF model is continuously valued, with the equipment of task-specific loss, it achieves impressive results even on discrete labelling tasks

    Exploring Context with Deep Structured models for Semantic Segmentation

    Full text link
    State-of-the-art semantic image segmentation methods are mostly based on training deep convolutional neural networks (CNNs). In this work, we proffer to improve semantic segmentation with the use of contextual information. In particular, we explore `patch-patch' context and `patch-background' context in deep CNNs. We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions. Specifically, we formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied in order to avoid repeated expensive CRF inference during the course of back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image inputs and sliding pyramid pooling is very effective for improving performance. We perform comprehensive evaluation of the proposed method. We achieve new state-of-the-art performance on a number of challenging semantic segmentation datasets including NYUDv2NYUDv2, PASCALPASCAL-VOC2012VOC2012, CityscapesCityscapes, PASCALPASCAL-ContextContext, SUNSUN-RGBDRGBD, SIFTSIFT-flowflow, and KITTIKITTI datasets. Particularly, we report an intersection-over-union score of 77.877.8 on the PASCALPASCAL-VOC2012VOC2012 dataset.Comment: 16 pages. Accepted to IEEE T. Pattern Analysis & Machine Intelligence, 2017. Extended version of arXiv:1504.0101

    Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks

    Full text link
    Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters

    Worst-case Optimal Submodular Extensions for Marginal Estimation

    Get PDF
    Submodular extensions of an energy function can be used to efficiently compute approximate marginals via variational inference. The accuracy of the marginals depends crucially on the quality of the submodular extension. To identify the best possible extension, we show an equivalence between the submodular extensions of the energy and the objective functions of linear programming (LP) relaxations for the corresponding MAP estimation problem. This allows us to (i) establish the worst-case optimality of the submodular extension for Potts model used in the literature; (ii) identify the worst-case optimal submodular extension for the more general class of metric labeling; and (iii) efficiently compute the marginals for the widely used dense CRF model with the help of a recently proposed Gaussian filtering method. Using synthetic and real data, we show that our approach provides comparable upper bounds on the log-partition function to those obtained using tree-reweighted message passing (TRW) in cases where the latter is computationally feasible. Importantly, unlike TRW, our approach provides the first practical algorithm to compute an upper bound on the dense CRF model.Comment: Accepted to AISTATS 201

    DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

    Get PDF
    In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
    • …
    corecore