4,897 research outputs found
Segmentation-Aware Convolutional Networks Using Local Attention Masks
We introduce an approach to integrate segmentation information within a
convolutional neural network (CNN). This counter-acts the tendency of CNNs to
smooth information across regions and increases their spatial precision. To
obtain segmentation information, we set up a CNN to provide an embedding space
where region co-membership can be estimated based on Euclidean distance. We use
these embeddings to compute a local attention mask relative to every neuron
position. We incorporate such masks in CNNs and replace the convolution
operation with a "segmentation-aware" variant that allows a neuron to
selectively attend to inputs coming from its own region. We call the resulting
network a segmentation-aware CNN because it adapts its filters at each image
point according to local segmentation cues. We demonstrate the merit of our
method on two widely different dense prediction tasks, that involve
classification (semantic segmentation) and regression (optical flow). Our
results show that in semantic segmentation we can match the performance of
DenseCRFs while being faster and simpler, and in optical flow we obtain clearly
sharper responses than networks that do not use local attention masks. In both
cases, segmentation-aware convolution yields systematic improvements over
strong baselines. Source code for this work is available online at
http://cs.cmu.edu/~aharley/segaware
End-to-end weakly-supervised semantic alignment
We tackle the task of semantic alignment where the goal is to compute dense
semantic correspondence aligning two images depicting objects of the same
category. This is a challenging task due to large intra-class variation,
changes in viewpoint and background clutter. We present the following three
principal contributions. First, we develop a convolutional neural network
architecture for semantic alignment that is trainable in an end-to-end manner
from weak image-level supervision in the form of matching image pairs. The
outcome is that parameters are learnt from rich appearance variation present in
different but semantically related images without the need for tedious manual
annotation of correspondences at training time. Second, the main component of
this architecture is a differentiable soft inlier scoring module, inspired by
the RANSAC inlier scoring procedure, that computes the quality of the alignment
based on only geometrically consistent correspondences thereby reducing the
effect of background clutter. Third, we demonstrate that the proposed approach
achieves state-of-the-art performance on multiple standard benchmarks for
semantic alignment.Comment: In 2018 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR 2018
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
Denoising diffusion probabilistic models have transformed image generation
with their impressive fidelity and diversity. We show that they also excel in
estimating optical flow and monocular depth, surprisingly, without
task-specific architectures and loss functions that are predominant for these
tasks. Compared to the point estimates of conventional regression-based
methods, diffusion models also enable Monte Carlo inference, e.g., capturing
uncertainty and ambiguity in flow and depth. With self-supervised pre-training,
the combined use of synthetic and real data for supervised training, and
technical innovations (infilling and step-unrolled denoising diffusion
training) to handle noisy-incomplete training data, and a simple form of
coarse-to-fine refinement, one can train state-of-the-art diffusion models for
depth and optical flow estimation. Extensive experiments focus on quantitative
performance against benchmarks, ablations, and the model's ability to capture
uncertainty and multimodality, and impute missing values. Our model, DDVM
(Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth
error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\%
on the KITTI optical flow benchmark, about 25\% better than the best published
method. For an overview see https://diffusion-vision.github.io
- …