7,931 research outputs found
Segmentation-Aware Convolutional Networks Using Local Attention Masks
We introduce an approach to integrate segmentation information within a
convolutional neural network (CNN). This counter-acts the tendency of CNNs to
smooth information across regions and increases their spatial precision. To
obtain segmentation information, we set up a CNN to provide an embedding space
where region co-membership can be estimated based on Euclidean distance. We use
these embeddings to compute a local attention mask relative to every neuron
position. We incorporate such masks in CNNs and replace the convolution
operation with a "segmentation-aware" variant that allows a neuron to
selectively attend to inputs coming from its own region. We call the resulting
network a segmentation-aware CNN because it adapts its filters at each image
point according to local segmentation cues. We demonstrate the merit of our
method on two widely different dense prediction tasks, that involve
classification (semantic segmentation) and regression (optical flow). Our
results show that in semantic segmentation we can match the performance of
DenseCRFs while being faster and simpler, and in optical flow we obtain clearly
sharper responses than networks that do not use local attention masks. In both
cases, segmentation-aware convolution yields systematic improvements over
strong baselines. Source code for this work is available online at
http://cs.cmu.edu/~aharley/segaware
Semantic Video CNNs through Representation Warping
In this work, we propose a technique to convert CNN models for semantic
segmentation of static images into CNNs for video data. We describe a warping
method that can be used to augment existing architectures with very little
extra computational cost. This module is called NetWarp and we demonstrate its
use for a range of network architectures. The main design principle is to use
optical flow of adjacent frames for warping internal network representations
across time. A key insight of this work is that fast optical flow methods can
be combined with many different CNN architectures for improved performance and
end-to-end training. Experiments validate that the proposed approach incurs
only little extra computational cost, while improving performance, when video
streams are available. We achieve new state-of-the-art results on the CamVid
and Cityscapes benchmark datasets and show consistent improvements over
different baseline networks. Our code and models will be available at
http://segmentation.is.tue.mpg.deComment: ICCV 201
Lucid Data Dreaming for Video Object Segmentation
Convolutional networks reach top quality in pixel-level video object
segmentation but require a large amount of training data (1k~100k) to deliver
such results. We propose a new training strategy which achieves
state-of-the-art results across three evaluation datasets while using 20x~1000x
less annotated data than competing methods. Our approach is suitable for both
single and multiple object segmentation. Instead of using large training sets
hoping to generalize across domains, we generate in-domain training data using
the provided annotation on the first frame of each video to synthesize ("lucid
dream") plausible future video frames. In-domain per-video training data allows
us to train high quality appearance- and motion-based models, as well as tune
the post-processing stage. This approach allows to reach competitive results
even when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the video object segmentation task a smaller
training set that is closer to the target domain is more effective. This
changes the mindset regarding how many training samples and general
"objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV
A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes
Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016
Discontinuity preserving image registration for breathing induced sliding organ motion
Image registration is a powerful tool in medical image analysis and facilitates
the clinical routine in several aspects. It became an indispensable device for
many medical applications including image-guided therapy systems. The
basic goal of image registration is to spatially align two images that show a
similar region of interest. More speci�cally, a displacement �eld respectively
a transformation is estimated, that relates the positions of the pixels or
feature points in one image to the corresponding positions in the other one.
The so gained alignment of the images assists the doctor in comparing and
diagnosing them. There exist di�erent kinds of image registration methods,
those which are capable to estimate a rigid transformation or more generally
an a�ne transformation between the images and those which are able to
capture a more complex motion by estimating a non-rigid transformation.
There are many well established non-rigid registration methods, but those
which are able to preserve discontinuities in the displacement �eld are rather
rare. These discontinuities appear in particular at organ boundaries during
the breathing induced organ motion.
In this thesis, we make use of the idea to combine motion segmentation
with registration to tackle the problem of preserving the discontinuities in
the resulting displacement �eld. We introduce a binary function to represent
the motion segmentation and the proposed discontinuity preserving
non-rigid registration method is then formulated in a variational framework.
Thus, an energy functional is de�ned and its minimisation with respect to
the displacement �eld and the motion segmentation will lead to the desired
result. In theory, one can prove that for the motion segmentation a global
minimiser of the energy functional can be found, if the displacement �eld
is given. The overall minimisation problem, however, is non-convex and a
suitable optimisation strategy has to be considered. Furthermore, depending
on whether we use the pure L1-norm or an approximation of it in the formulation
of the energy functional, we use di�erent numerical methods to solve
the minimisation problem. More speci�cally, when using an approximation
of the L1-norm, the minimisation of the energy functional with respect to the displacement �eld is performed through Brox et al.'s �xed point iteration
scheme, and the minimisation with respect to the motion segmentation
with the dual algorithm of Chambolle. On the other hand, when we make
use of the pure L1-norm in the energy functional, the primal-dual algorithm
of Chambolle and Pock is used for both, the minimisation with respect to
the displacement �eld and the motion segmentation. This approach is clearly
faster compared to the one using the approximation of the L1-norm and also
theoretically more appealing. Finally, to support the registration method
during the minimisation process, we incorporate additionally in a later approach
the information of certain landmark positions into the formulation of
the energy functional, that makes use of the pure L1-norm. Similarly as before,
the primal-dual algorithm of Chambolle and Pock is then used for both,
the minimisation with respect to the displacement �eld and the motion segmentation.
All the proposed non-rigid discontinuity preserving registration
methods delivered promising results for experiments with synthetic images
and real MR images of breathing induced liver motion
- …