7,931 research outputs found

    Segmentation-Aware Convolutional Networks Using Local Attention Masks

    Get PDF
    We introduce an approach to integrate segmentation information within a convolutional neural network (CNN). This counter-acts the tendency of CNNs to smooth information across regions and increases their spatial precision. To obtain segmentation information, we set up a CNN to provide an embedding space where region co-membership can be estimated based on Euclidean distance. We use these embeddings to compute a local attention mask relative to every neuron position. We incorporate such masks in CNNs and replace the convolution operation with a "segmentation-aware" variant that allows a neuron to selectively attend to inputs coming from its own region. We call the resulting network a segmentation-aware CNN because it adapts its filters at each image point according to local segmentation cues. We demonstrate the merit of our method on two widely different dense prediction tasks, that involve classification (semantic segmentation) and regression (optical flow). Our results show that in semantic segmentation we can match the performance of DenseCRFs while being faster and simpler, and in optical flow we obtain clearly sharper responses than networks that do not use local attention masks. In both cases, segmentation-aware convolution yields systematic improvements over strong baselines. Source code for this work is available online at http://cs.cmu.edu/~aharley/segaware

    Semantic Video CNNs through Representation Warping

    Full text link
    In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network representations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to-end training. Experiments validate that the proposed approach incurs only little extra computational cost, while improving performance, when video streams are available. We achieve new state-of-the-art results on the CamVid and Cityscapes benchmark datasets and show consistent improvements over different baseline networks. Our code and models will be available at http://segmentation.is.tue.mpg.deComment: ICCV 201

    Lucid Data Dreaming for Video Object Segmentation

    Full text link
    Convolutional networks reach top quality in pixel-level video object segmentation but require a large amount of training data (1k~100k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x~1000x less annotated data than competing methods. Our approach is suitable for both single and multiple object segmentation. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the video object segmentation task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general "objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV

    A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes

    Full text link
    Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016

    Discontinuity preserving image registration for breathing induced sliding organ motion

    Get PDF
    Image registration is a powerful tool in medical image analysis and facilitates the clinical routine in several aspects. It became an indispensable device for many medical applications including image-guided therapy systems. The basic goal of image registration is to spatially align two images that show a similar region of interest. More speci�cally, a displacement �eld respectively a transformation is estimated, that relates the positions of the pixels or feature points in one image to the corresponding positions in the other one. The so gained alignment of the images assists the doctor in comparing and diagnosing them. There exist di�erent kinds of image registration methods, those which are capable to estimate a rigid transformation or more generally an a�ne transformation between the images and those which are able to capture a more complex motion by estimating a non-rigid transformation. There are many well established non-rigid registration methods, but those which are able to preserve discontinuities in the displacement �eld are rather rare. These discontinuities appear in particular at organ boundaries during the breathing induced organ motion. In this thesis, we make use of the idea to combine motion segmentation with registration to tackle the problem of preserving the discontinuities in the resulting displacement �eld. We introduce a binary function to represent the motion segmentation and the proposed discontinuity preserving non-rigid registration method is then formulated in a variational framework. Thus, an energy functional is de�ned and its minimisation with respect to the displacement �eld and the motion segmentation will lead to the desired result. In theory, one can prove that for the motion segmentation a global minimiser of the energy functional can be found, if the displacement �eld is given. The overall minimisation problem, however, is non-convex and a suitable optimisation strategy has to be considered. Furthermore, depending on whether we use the pure L1-norm or an approximation of it in the formulation of the energy functional, we use di�erent numerical methods to solve the minimisation problem. More speci�cally, when using an approximation of the L1-norm, the minimisation of the energy functional with respect to the displacement �eld is performed through Brox et al.'s �xed point iteration scheme, and the minimisation with respect to the motion segmentation with the dual algorithm of Chambolle. On the other hand, when we make use of the pure L1-norm in the energy functional, the primal-dual algorithm of Chambolle and Pock is used for both, the minimisation with respect to the displacement �eld and the motion segmentation. This approach is clearly faster compared to the one using the approximation of the L1-norm and also theoretically more appealing. Finally, to support the registration method during the minimisation process, we incorporate additionally in a later approach the information of certain landmark positions into the formulation of the energy functional, that makes use of the pure L1-norm. Similarly as before, the primal-dual algorithm of Chambolle and Pock is then used for both, the minimisation with respect to the displacement �eld and the motion segmentation. All the proposed non-rigid discontinuity preserving registration methods delivered promising results for experiments with synthetic images and real MR images of breathing induced liver motion
    corecore