15 research outputs found
Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses
Many classical Computer Vision problems, such as essential matrix computation
and pose estimation from 3D to 2D correspondences, can be solved by finding the
eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix
representing a linear system. Incorporating this in deep learning frameworks
would allow us to explicitly encode known notions of geometry, instead of
having the network implicitly learn them from data. However, performing
eigendecomposition within a network requires the ability to differentiate this
operation. Unfortunately, while theoretically doable, this introduces numerical
instability in the optimization process in practice.
In this paper, we introduce an eigendecomposition-free approach to training a
deep network whose loss depends on the eigenvector corresponding to a zero
eigenvalue of a matrix predicted by the network. We demonstrate on several
tasks, including keypoint matching and 3D pose estimation, that our approach is
much more robust than explicit differentiation of the eigendecomposition, It
has better convergence properties and yields state-of-the-art results on both
tasks.Comment: 25 page
ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning
Many problems in computer vision require dealing with sparse, unordered data
in the form of point clouds. Permutation-equivariant networks have become a
popular solution-they operate on individual data points with simple perceptrons
and extract contextual information with global pooling. This can be achieved
with a simple normalization of the feature maps, a global operation that is
unaffected by the order. In this paper, we propose Attentive Context
Normalization (ACN), a simple yet effective technique to build
permutation-equivariant networks robust to outliers. Specifically, we show how
to normalize the feature maps with weights that are estimated within the
network, excluding outliers from this normalization. We use this mechanism to
leverage two types of attention: local and global-by combining them, our method
is able to find the essential data points in high-dimensional space to solve a
given task. We demonstrate through extensive experiments that our approach,
which we call Attentive Context Networks (ACNe), provides a significant leap in
performance compared to the state-of-the-art on camera pose estimation, robust
fitting, and point cloud classification under noise and outliers. Source code:
https://github.com/vcg-uvic/acne.Comment: CVPR 202
Quantum Statistics-Inspired Neural Attention
Sequence-to-sequence (encoder-decoder) models with attention constitute a
cornerstone of deep learning research, as they have enabled unprecedented
sequential data modeling capabilities. This effectiveness largely stems from
the capacity of these models to infer salient temporal dynamics over long
horizons; these are encoded into the obtained neural attention (NA)
distributions. However, existing NA formulations essentially constitute
point-wise selection mechanisms over the observed source sequences; that is,
attention weights computation relies on the assumption that each source
sequence element is independent of the rest. Unfortunately, although
convenient, this assumption fails to account for higher-order dependencies
which might be prevalent in real-world data. This paper addresses these
limitations by leveraging Quantum-Statistical modeling arguments. Specifically,
our work broadens the notion of NA, by attempting to account for the case that
the NA model becomes inherently incapable of discerning between individual
source elements; this is assumed to be the case due to higher-order temporal
dynamics. On the contrary, we postulate that in some cases selection may be
feasible only at the level of pairs of source sequence elements. To this end,
we cast NA into inference of an attention density matrix (ADM) approximation.
We derive effective training and inference algorithms, and evaluate our
approach in the context of a machine translation (MT) application. We perform
experiments with challenging benchmark datasets. As we show, our approach
yields favorable outcomes in terms of several evaluation metrics.Comment: Submitted to The 23rd Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD 2019
3DRegNet: A Deep Neural Network for 3D Point Registration
We present 3DRegNet, a novel deep learning architecture for the registration
of 3D scans. Given a set of 3D point correspondences, we build a deep neural
network to address the following two challenges: (i) classification of the
point correspondences into inliers/outliers, and (ii) regression of the motion
parameters that align the scans into a common reference frame. With regard to
regression, we present two alternative approaches: (i) a Deep Neural Network
(DNN) registration and (ii) a Procrustes approach using SVD to estimate the
transformation. Our correspondence-based approach achieves a higher speedup
compared to competing baselines. We further propose the use of a refinement
network, which consists of a smaller 3DRegNet as a refinement to improve the
accuracy of the registration. Extensive experiments on two challenging datasets
demonstrate that we outperform other methods and achieve state-of-the-art
results. The code is available.Comment: 15 pages, 8 figures, 6 table
Learning 3D-3D Correspondences for One-shot Partial-to-partial Registration
While 3D-3D registration is traditionally tacked by optimization-based
methods, recent work has shown that learning-based techniques could achieve
faster and more robust results. In this context, however, only PRNet can handle
the partial-to-partial registration scenario. Unfortunately, this is achieved
at the cost of relying on an iterative procedure, with a complex network
architecture. Here, we show that learning-based partial-to-partial registration
can be achieved in a one-shot manner, jointly reducing network complexity and
increasing registration accuracy. To this end, we propose an Optimal Transport
layer able to account for occluded points thanks to the use of outlier bins.
The resulting OPRNet framework outperforms the state of the art on standard
benchmarks, demonstrating better robustness and generalization ability than
existing techniques.Comment: 11 page
Scribble-Supervised Semantic Segmentation by Random Walk on Neural Representation and Self-Supervision on Neural Eigenspace
Scribble-supervised semantic segmentation has gained much attention recently
for its promising performance without high-quality annotations. Many approaches
have been proposed. Typically, they handle this problem to either introduce a
well-labeled dataset from another related task, turn to iterative refinement
and post-processing with the graphical model, or manipulate the scribble label.
This work aims to achieve semantic segmentation supervised by scribble label
directly without auxiliary information and other intermediate manipulation.
Specifically, we impose diffusion on neural representation by random walk and
consistency on neural eigenspace by self-supervision, which forces the neural
network to produce dense and consistent predictions over the whole dataset. The
random walk embedded in the network will compute a probabilistic transition
matrix, with which the neural representation diffused to be uniform. Moreover,
given the probabilistic transition matrix, we apply the self-supervision on its
eigenspace for consistency in the image's main parts. In addition to comparing
the common scribble dataset, we also conduct experiments on the modified
datasets that randomly shrink and even drop the scribbles on image objects. The
results demonstrate the superiority of the proposed method and are even
comparable to some full-label supervised ones. The code and datasets are
available at https://github.com/panzhiyi/RW-SS
DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares
We propose a surface fitting method for unstructured 3D point clouds. This
method, called DeepFit, incorporates a neural network to learn point-wise
weights for weighted least squares polynomial surface fitting. The learned
weights act as a soft selection for the neighborhood of surface points thus
avoiding the scale selection required of previous methods. To train the network
we propose a novel surface consistency loss that improves point weight
estimation. The method enables extracting normal vectors and other geometrical
properties, such as principal curvatures, the latter were not presented as
ground truth during training. We achieve state-of-the-art results on a
benchmark normal and curvature estimation dataset, demonstrate robustness to
noise, outliers and density variations, and show its application on noise
removal.Comment: arXiv admin note: text overlap with arXiv:1812.0070
Single-Stage 6D Object Pose Estimation
Most recent 6D pose estimation frameworks first rely on a deep network to
establish correspondences between 3D object keypoints and 2D image locations
and then use a variant of a RANSAC-based Perspective-n-Point (PnP) algorithm.
This two-stage process, however, is suboptimal: First, it is not end-to-end
trainable. Second, training the deep network relies on a surrogate loss that
does not directly reflect the final 6D pose estimation task.
In this work, we introduce a deep architecture that directly regresses 6D
poses from correspondences. It takes as input a group of candidate
correspondences for each 3D keypoint and accounts for the fact that the order
of the correspondences within each group is irrelevant, while the order of the
groups, that is, of the 3D keypoints, is fixed. Our architecture is generic and
can thus be exploited in conjunction with existing correspondence-extraction
networks so as to yield single-stage 6D pose estimation frameworks. Our
experiments demonstrate that these single-stage frameworks consistently
outperform their two-stage counterparts in terms of both accuracy and speed.Comment: CVPR 202
Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point Problem
Conventional absolute camera pose via a Perspective-n-Point (PnP) solver
often assumes that the correspondences between 2D image pixels and 3D points
are given. When the correspondences between 2D and 3D points are not known a
priori, the task becomes the much more challenging blind PnP problem. This
paper proposes a deep CNN model which simultaneously solves for both the 6-DoF
absolute camera pose and 2D--3D correspondences. Our model comprises three
neural modules connected in sequence. First, a two-stream PointNet-inspired
network is applied directly to both the 2D image keypoints and the 3D scene
points in order to extract discriminative point-wise features harnessing both
local and contextual information. Second, a global feature matching module is
employed to estimate a matchability matrix among all 2D--3D pairs. Third, the
obtained matchability matrix is fed into a classification module to
disambiguate inlier matches. The entire network is trained end-to-end, followed
by a robust model fitting (P3P-RANSAC) at test time only to recover the 6-DoF
camera pose. Extensive tests on both real and simulated data have shown that
our method substantially outperforms existing approaches, and is capable of
processing thousands of points a second with the state-of-the-art accuracy.Comment: A blind-PnP solve
Beyond Cartesian Representations for Local Descriptors
The dominant approach for learning local patch descriptors relies on small
image regions whose scale must be properly estimated a priori by a keypoint
detector. In other words, if two patches are not in correspondence, their
descriptors will not match. A strategy often used to alleviate this problem is
to "pool" the pixel-wise features over log-polar regions, rather than regularly
spaced ones. By contrast, we propose to extract the "support region" directly
with a log-polar sampling scheme. We show that this provides us with a better
representation by simultaneously oversampling the immediate neighbourhood of
the point and undersampling regions far away from it. We demonstrate that this
representation is particularly amenable to learning descriptors with deep
networks. Our models can match descriptors across a much wider range of scales
than was possible before, and also leverage much larger support regions without
suffering from occlusions. We report state-of-the-art results on three
different datasets