2,424 research outputs found
Deep Projective 3D Semantic Segmentation
Semantic segmentation of 3D point clouds is a challenging problem with
numerous real-world applications. While deep learning has revolutionized the
field of image semantic segmentation, its impact on point cloud data has been
limited so far. Recent attempts, based on 3D deep learning approaches
(3D-CNNs), have achieved below-expected results. Such methods require
voxelizations of the underlying point cloud data, leading to decreased spatial
resolution and increased memory consumption. Additionally, 3D-CNNs greatly
suffer from the limited availability of annotated datasets.
In this paper, we propose an alternative framework that avoids the
limitations of 3D-CNNs. Instead of directly solving the problem in 3D, we first
project the point cloud onto a set of synthetic 2D-images. These images are
then used as input to a 2D-CNN, designed for semantic segmentation. Finally,
the obtained prediction scores are re-projected to the point cloud to obtain
the segmentation results. We further investigate the impact of multiple
modalities, such as color, depth and surface normals, in a multi-stream network
architecture. Experiments are performed on the recent Semantic3D dataset. Our
approach sets a new state-of-the-art by achieving a relative gain of 7.9 %,
compared to the previous best approach.Comment: Submitted to CAIP 201
Recommended from our members
EWA Splatting
In this paper, we present a framework for high quality splatting based on elliptical Gaussian kernels. To avoid aliasing artifacts, we introduce the concept of a resampling filter, combining a reconstruction kernel with a low-pass filter. Because of the similarity to Heckbert's EWA (elliptical weighted average) filter for texture mapping, we call our technique EWA splatting. Our framework allows us to derive EWA splat primitives for volume data and for point-sampled surface data. It provides high image quality without aliasing artifacts or excessive blurring for volume data and, additionally, features anisotropic texture filtering for point-sampled surfaces. It also handles nonspherical volume kernels efficiently; hence, it is suitable for regular, rectilinear, and irregular volume datasets. Moreover, our framework introduces a novel approach to compute the footprint function, facilitating efficient perspective projection of arbitrary elliptical kernels at very little additional cost. Finally, we show that EWA volume reconstruction kernels can be reduced to surface reconstruction kernels. This makes our splat primitive universal in rendering surface and volume data.Engineering and Applied Science
Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks
Bilateral filters have wide spread use due to their edge-preserving
properties. The common use case is to manually choose a parametric filter type,
usually a Gaussian filter. In this paper, we will generalize the
parametrization and in particular derive a gradient descent algorithm so the
filter parameters can be learned from data. This derivation allows to learn
high dimensional linear filters that operate in sparsely populated feature
spaces. We build on the permutohedral lattice construction for efficient
filtering. The ability to learn more general forms of high-dimensional filters
can be used in several diverse applications. First, we demonstrate the use in
applications where single filter applications are desired for runtime reasons.
Further, we show how this algorithm can be used to learn the pairwise
potentials in densely connected conditional random fields and apply these to
different image segmentation tasks. Finally, we introduce layers of bilateral
filters in CNNs and propose bilateral neural networks for the use of
high-dimensional sparse data. This view provides new ways to encode model
structure into network architectures. A diverse set of experiments empirically
validates the usage of general forms of filters
Lattice-Based High-Dimensional Gaussian Filtering and the Permutohedral Lattice
High-dimensional Gaussian filtering is a popular technique in image processing, geometry processing and computer graphics for smoothing data while preserving important features. For instance, the bilateral filter, cross bilateral filter and non-local means filter fall under the broad umbrella of high-dimensional Gaussian filters. Recent algorithmic advances therein have demonstrated that by relying on a sampled representation of the underlying space, one can obtain speed-ups of orders of magnitude over the naïve approach. The simplest such sampled representation is a lattice, and it has been used successfully in the bilateral grid and the permutohedral lattice algorithms. In this paper, we analyze these lattice-based algorithms, developing a general theory of lattice-based high-dimensional Gaussian filtering. We consider the set of criteria for an optimal lattice for filtering, as it offers a good tradeoff of quality for computational efficiency, and evaluate the existing lattices under the criteria. In particular, we give a rigorous exposition of the properties of the permutohedral lattice and argue that it is the optimal lattice for Gaussian filtering. Lastly, we explore further uses of the permutohedral-lattice-based Gaussian filtering framework, showing that it can be easily adapted to perform mean shift filtering and yield improvement over the traditional approach based on a Cartesian grid.Stanford University (Reed-Hodgson Fellowship)Nokia Research Cente
Efficient Relaxations for Dense CRFs with Sparse Higher Order Potentials
Dense conditional random fields (CRFs) have become a popular framework for
modelling several problems in computer vision such as stereo correspondence and
multi-class semantic segmentation. By modelling long-range interactions, dense
CRFs provide a labelling that captures finer detail than their sparse
counterparts. Currently, the state-of-the-art algorithm performs mean-field
inference using a filter-based method but fails to provide a strong theoretical
guarantee on the quality of the solution. A question naturally arises as to
whether it is possible to obtain a maximum a posteriori (MAP) estimate of a
dense CRF using a principled method. Within this paper, we show that this is
indeed possible. We will show that, by using a filter-based method, continuous
relaxations of the MAP problem can be optimised efficiently using
state-of-the-art algorithms. Specifically, we will solve a quadratic
programming (QP) relaxation using the Frank-Wolfe algorithm and a linear
programming (LP) relaxation by developing a proximal minimisation framework. By
exploiting labelling consistency in the higher-order potentials and utilising
the filter-based method, we are able to formulate the above algorithms such
that each iteration has a complexity linear in the number of classes and random
variables. The presented algorithms can be applied to any labelling problem
using a dense CRF with sparse higher-order potentials. In this paper, we use
semantic segmentation as an example application as it demonstrates the ability
of the algorithm to scale to dense CRFs with large dimensions. We perform
experiments on the Pascal dataset to indicate that the presented algorithms are
able to attain lower energies than the mean-field inference method
Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering
Scene parsing has attracted a lot of attention in computer vision. While
parametric models have proven effective for this task, they cannot easily
incorporate new training data. By contrast, nonparametric approaches, which
bypass any learning phase and directly transfer the labels from the training
data to the query images, can readily exploit new labeled samples as they
become available. Unfortunately, because of the computational cost of their
label transfer procedures, state-of-the-art nonparametric methods typically
filter out most training images to only keep a few relevant ones to label the
query. As such, these methods throw away many images that still contain
valuable information and generally obtain an unbalanced set of labeled samples.
In this paper, we introduce a nonparametric approach to scene parsing that
follows a sample-and-filter strategy. More specifically, we propose to sample
labeled superpixels according to an image similarity score, which allows us to
obtain a balanced set of samples. We then formulate label transfer as an
efficient filtering procedure, which lets us exploit more labeled samples than
existing techniques. Our experiments evidence the benefits of our approach over
state-of-the-art nonparametric methods on two benchmark datasets.Comment: Please refer to the CVPR-2016 version of this manuscrip
Video Propagation Networks
We propose a technique that propagates information forward through video
data. The method is conceptually simple and can be applied to tasks that
require the propagation of structured information, such as semantic labels,
based on video content. We propose a 'Video Propagation Network' that processes
video frames in an adaptive manner. The model is applied online: it propagates
information forward without the need to access future frames. In particular we
combine two components, a temporal bilateral network for dense and video
adaptive filtering, followed by a spatial network to refine features and
increased flexibility. We present experiments on video object segmentation and
semantic video segmentation and show increased performance comparing to the
best previous task-specific methods, while having favorable runtime.
Additionally we demonstrate our approach on an example regression task of color
propagation in a grayscale video.Comment: Appearing in Computer Vision and Pattern Recognition, 2017 (CVPR'17
- …
