249 research outputs found
Rotation Averaging and Strong Duality
In this paper we explore the role of duality principles within the problem of
rotation averaging, a fundamental task in a wide range of computer vision
applications. In its conventional form, rotation averaging is stated as a
minimization over multiple rotation constraints. As these constraints are
non-convex, this problem is generally considered challenging to solve globally.
We show how to circumvent this difficulty through the use of Lagrangian
duality. While such an approach is well-known it is normally not guaranteed to
provide a tight relaxation. Based on spectral graph theory, we analytically
prove that in many cases there is no duality gap unless the noise levels are
severe. This allows us to obtain certifiably global solutions to a class of
important non-convex problems in polynomial time.
We also propose an efficient, scalable algorithm that out-performs general
purpose numerical solvers and is able to handle the large problem instances
commonly occurring in structure from motion settings. The potential of this
proposed method is demonstrated on a number of different problems, consisting
of both synthetic and real-world data
Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors
In recent years, considerable progress has been made for the task of rigid
object pose estimation from a single RGB-image, but achieving robustness to
partial occlusions remains a challenging problem. Pose refinement via rendering
has shown promise in order to achieve improved results, in particular, when
data is scarce.
In this paper we focus our attention on pose refinement, and show how to push
the state-of-the-art further in the case of partial occlusions. The proposed
pose refinement method leverages on a simplified learning task, where a CNN is
trained to estimate the reprojection error between an observed and a rendered
image. We experiment by training on purely synthetic data as well as a mixture
of synthetic and real data. Current state-of-the-art results are outperformed
for two out of three metrics on the Occlusion LINEMOD benchmark, while
performing on-par for the final metric.Comment: Added acknowledgement
Generalized roof duality
AbstractThe roof dual bound for quadratic unconstrained binary optimization is the basis for several methods for efficiently computing the solution to many hard combinatorial problems. It works by constructing the tightest possible lower-bounding submodular function, and instead of minimizing the original objective function, the relaxation is minimized. However, for higher-order problems the technique has been less successful. A standard technique is to first reduce the problem into a quadratic one by introducing auxiliary variables and then apply the quadratic roof dual bound, but this may lead to loose bounds.We generalize the roof duality technique to higher-order optimization problems. Similarly to the quadratic case, optimal relaxations are defined to be the ones that give the maximum lower bound. We show how submodular relaxations can efficiently be constructed in order to compute the generalized roof dual bound for general cubic and quartic pseudo-boolean functions. Further, we prove that important properties such as persistency still hold, which allows us to determine optimal values for some of the variables. From a practical point of view, we experimentally demonstrate that the technique outperforms the state of the art for a wide range of applications, both in terms of lower bounds and in the number of assigned variables
Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors
In recent years, considerable progress has been made for the task of rigid object pose estimation from a single RGB-image, but achieving robustness to partial occlusions remains a challenging problem. Pose refinement via rendering has shown promise in order to achieve improved results, in particular, when data is scarce. In this paper we focus our attention on pose refinement, and show how to push the state-of-the-art further in the case of partial occlusions. The proposed pose refinement method leverages on a simplified learning task, where a CNN is trained to estimate the reprojection error between an observed and a rendered image. We experiment by training on purely synthetic data as well as a mixture of synthetic and real data. Current state-of-the-art results are outperformed for two out of three metrics on the Occlusion LINEMOD benchmark, while performing on-par for the final metric
Parallel and Distributed Graph Cuts
Graph cuts methods are at the core of many state-of-the-art algorithms in computer vision due to their efficiency in computing globally optimal solutions. In this paper, we solve the maximum flow/minimum cut problem in parallel by splitting the graph into multiple parts and hence, further increase the computational efficacy of graph cuts. Optimality of the solution is guaranteed by dual decomposition, or more specifically, the solutions to the subproblems are constrained to be equal on the overlap with dual variables. We demonstrate that our approach both allows (i) faster processing on multi-core computers and (ii) the capability to handle larger problems by splitting the graph across multiple computers on a distributed network. Even though our approach does not give a theoretical guarantee of speedup, an extensive empirical evaluation on several applications with many different data sets consistently shows good performance
Triangulation of Points, Lines and Conics
The problem of reconstructing 3D scene features from multiple views with known camera motion and given image correspondences is considered. This is a classical and one of the most basic geometric problems in computer vision and photogrammetry. Yet, previous methods fail to guarantee optimal reconstructions - they are either plagued by local minima or rely on a non-optimal cost-function. A common framework for the triangulation problem of points, lines and conics is presented. We define what is meant by an optimal triangulation based on statistical principles and then derive an algorithm for computing the globally optimal solution. The method for achieving the global minimum is based on convex and concave relaxations for both fractionals and monomials. The performance of the method is evaluated on real image data
A case for using rotation invariant features in state of the art feature matchers
The aim of this paper is to demonstrate that a state of the art feature
matcher (LoFTR) can be made more robust to rotations by simply replacing the
backbone CNN with a steerable CNN which is equivariant to translations and
image rotations. It is experimentally shown that this boost is obtained without
reducing performance on ordinary illumination and viewpoint matching sequences.Comment: CVPRW 2022 camera read
Mesh Types for Curvature Regularization
Length and area regularization are commonplace for inverse problems today. It has however turned out to be much more difficult to incorporate a curvature prior. In this paper we propose two improvements to a recently proposed framework based on global optimization. The mesh geometry is analyzed both from a theoretical and experimental viewpoint and hexagonal meshes are shown to be superior. Our second contribution is that we generalize the framework to handle mean curvature regularization for 3D surface completion and segmentation
A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials
Are we using the right potential functions in the Conditional Random Field
models that are popular in the Vision community? Semantic segmentation and
other pixel-level labelling tasks have made significant progress recently due
to the deep learning paradigm. However, most state-of-the-art structured
prediction methods also include a random field model with a hand-crafted
Gaussian potential to model spatial priors, label consistencies and
feature-based image conditioning.
In this paper, we challenge this view by developing a new inference and
learning framework which can learn pairwise CRF potentials restricted only by
their dependence on the image pixel values and the size of the support. Both
standard spatial and high-dimensional bilateral kernels are considered. Our
framework is based on the observation that CRF inference can be achieved via
projected gradient descent and consequently, can easily be integrated in deep
neural networks to allow for end-to-end training. It is empirically
demonstrated that such learned potentials can improve segmentation accuracy and
that certain label class interactions are indeed better modelled by a
non-Gaussian potential. In addition, we compare our inference method to the
commonly used mean-field algorithm. Our framework is evaluated on several
public benchmarks for semantic segmentation with improved performance compared
to previous state-of-the-art CNN+CRF models.Comment: Presented at EMMCVPR 2017 conferenc
- …