14 research outputs found
Permutohedral Lattice CNNs
This paper presents a convolutional layer that is able to process sparse
input features. As an example, for image recognition problems this allows an
efficient filtering of signals that do not lie on a dense grid (like pixel
position), but of more general features (such as color values). The presented
algorithm makes use of the permutohedral lattice data structure. The
permutohedral lattice was introduced to efficiently implement a bilateral
filter, a commonly used image processing operation. Its use allows for a
generalization of the convolution type found in current (spatial) convolutional
network architectures
MouldingNet: Deep-learning for 3D Object Reconstruction
th the rise of deep neural networks a number of approaches for learning over 3D data have gained popularity. In this paper, we take advantage of one of these approaches, bilateral convolutional layers to propose a novel end-to-end deep auto-encoder architecture to efficiently encode and reconstruct 3D point clouds. Bilateral convolutional layers project the input point cloud onto an even tessellation of a hyperplane in the (d Ă…1)-dimensional space known as the permutohedral lattice and perform convolutions over this representation. In contrast to existing point cloud based learning approaches, this allows us to learn over the underlying geometry of the object to create a robust global descriptor. We demonstrate its accuracy by evaluating across the shapenet and modelnet datasets, in order to illustrate 2 main scenarios, known and unknown object reconstruction. These experiments show that our network generalises well from seen classes to unseen classes
MouldingNet: Deep-Learning for 3D Object Reconstruction
With the rise of deep neural networks a number of approaches for learning over 3D data have gained popularity. In this paper, we take advantage of one of these approaches, bilateral convolutional layers to propose a novel end-to-end deep auto-encoder architecture to efficiently encode and reconstruct 3D point clouds. Bilateral convolutional layers project the input point cloud onto an even tessellation of a hyperplane in the -dimensional space known as the permutohedral lattice and perform convolutions over this representation. In contrast to existing point cloud based learning approaches, this allows us to learn over the underlying geometry of the object to create a robust global descriptor. We demonstrate its accuracy by evaluating across the shapenet and modelnet datasets, in order to illustrate 2 main scenarios, known and unknown object reconstruction. These experiments show that our network generalises well from seen classes to unseen classes
Superpixel Convolutional Networks using Bilateral Inceptions
In this paper we propose a CNN architecture for semantic image segmentation.
We introduce a new 'bilateral inception' module that can be inserted in
existing CNN architectures and performs bilateral filtering, at multiple
feature-scales, between superpixels in an image. The feature spaces for
bilateral filtering and other parameters of the module are learned end-to-end
using standard backpropagation techniques. The bilateral inception module
addresses two issues that arise with general CNN segmentation architectures.
First, this module propagates information between (super) pixels while
respecting image edges, thus using the structured information of the problem
for improved results. Second, the layer recovers a full resolution segmentation
result from the lower resolution solution of a CNN. In the experiments, we
modify several existing CNN architectures by inserting our inception module
between the last CNN (1x1 convolution) layers. Empirical results on three
different datasets show reliable improvements not only in comparison to the
baseline networks, but also in comparison to several dense-pixel prediction
techniques such as CRFs, while being competitive in time.Comment: European Conference on Computer Vision (ECCV), 201
Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice
Dense prediction tasks typically employ encoder-decoder architectures, but
the prevalent convolutions in the decoder are not image-adaptive and can lead
to boundary artifacts. Different generalized convolution operations have been
introduced to counteract this. We go beyond these by leveraging guidance data
to redefine their inherent notion of proximity. Our proposed network layer
builds on the permutohedral lattice, which performs sparse convolutions in a
high-dimensional space allowing for powerful non-local operations despite small
filters. Multiple features with different characteristics span this
permutohedral space. In contrast to prior work, we learn these features in a
task-specific manner by generalizing the basic permutohedral operations to
learnt feature representations. As the resulting objective is complex, a
carefully designed framework and learning procedure are introduced, yielding
rich feature embeddings in practice. We demonstrate the general applicability
of our approach in different joint upsampling tasks. When adding our network
layer to state-of-the-art networks for optical flow and semantic segmentation,
boundary artifacts are removed and the accuracy is improved.Comment: To appear at GCPR 201
What Matters for 3D Scene Flow Network
3D scene flow estimation from point clouds is a low-level 3D motion
perception task in computer vision. Flow embedding is a commonly used technique
in scene flow estimation, and it encodes the point motion between two
consecutive frames. Thus, it is critical for the flow embeddings to capture the
correct overall direction of the motion. However, previous works only search
locally to determine a soft correspondence, ignoring the distant points that
turn out to be the actual matching ones. In addition, the estimated
correspondence is usually from the forward direction of the adjacent point
clouds, and may not be consistent with the estimated correspondence acquired
from the backward direction. To tackle these problems, we propose a novel
all-to-all flow embedding layer with backward reliability validation during the
initial scene flow estimation. Besides, we investigate and compare several
design choices in key components of the 3D scene flow network, including the
point similarity calculation, input elements of predictor, and predictor &
refinement level design. After carefully choosing the most effective designs,
we are able to present a model that achieves the state-of-the-art performance
on FlyingThings3D and KITTI Scene Flow datasets. Our proposed model surpasses
all existing methods by at least 38.2% on FlyingThings3D dataset and 24.7% on
KITTI Scene Flow dataset for EPE3D metric. We release our codes at
https://github.com/IRMVLab/3DFlow.Comment: Accepted by ECCV 202
Deep Learning Applications in Medical Image and Shape Analysis
Deep learning is one of the most rapidly growing fields in computer and data science in the past few years. It has been widely used for feature extraction and recognition in various applications. The training process as a black-box utilizes deep neural networks, whose parameters are adjusted by minimizing the difference between the predicted feedback and labeled data (so-called training dataset). The trained model is then applied to unknown inputs to predict the results that mimic human\u27s decision-making. This technology has found tremendous success in many fields involving data analysis such as images, shapes, texts, audio and video signals and so on. In medical applications, images have been regularly used by physicians for diagnosis of diseases, making treatment plans, and tracking progress of patient treatment. One of the most challenging and common problems in image processing is segmentation of features of interest, so-called feature extraction. To this end, we aim to develop a deep learning framework in the current thesis to extract regions of interest in wound images. In addition, we investigate deep learning approaches for segmentation of 3D surface shapes as a potential tool for surface analysis in our future work. Experiments are presented and discussed for both 2D image and 3D shape analysis using deep learning networks