5,646 research outputs found
Clique descriptor of affine invariant regions for robust wide baseline image matching
Assuming that the image distortion between corresponding regions of a stereo pair of images with wide baseline can be approximated as an affine transformation if the regions are reasonably small, recent image matching algorithms have focused on affine invariant region (IR) detection and its description to increase the robustness in matching. However, the distinctiveness of an intensity-based region descriptor tends to deteriorate when an image includes homogeneous texture or repetitive pattern. To address this problem, we investigated the geometry of a local IR cluster (also called a clique) and propose a new clique-based image matching method. In the proposed method, the clique of an IR is estimated by Delaunay triangulation in a local affine frame and the Hausdorff distance is adopted for matching an inexact number of multiple descriptor vectors. We also introduce two adaptively weighted clique distances, where the neighbour distance in a clique is appropriately weighted according to characteristics of the local feature distribution. Experimental results show the clique-based matching method produces more tentative correspondences than variants of the SIFT-based method
DeepMatching: Hierarchical Deformable Dense Matching
We introduce a novel matching algorithm, called DeepMatching, to compute
dense correspondences between images. DeepMatching relies on a hierarchical,
multi-layer, correlational architecture designed for matching images and was
inspired by deep convolutional approaches. The proposed matching algorithm can
handle non-rigid deformations and repetitive textures and efficiently
determines dense correspondences in the presence of significant changes between
images. We evaluate the performance of DeepMatching, in comparison with
state-of-the-art matching algorithms, on the Mikolajczyk (Mikolajczyk et al
2005), the MPI-Sintel (Butler et al 2012) and the Kitti (Geiger et al 2013)
datasets. DeepMatching outperforms the state-of-the-art algorithms and shows
excellent results in particular for repetitive textures.We also propose a
method for estimating optical flow, called DeepFlow, by integrating
DeepMatching in the large displacement optical flow (LDOF) approach of Brox and
Malik (2011). Compared to existing matching algorithms, additional robustness
to large displacements and complex motion is obtained thanks to our matching
approach. DeepFlow obtains competitive performance on public benchmarks for
optical flow estimation
Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation
How do computers and intelligent agents view the world around them? Feature
extraction and representation constitutes one the basic building blocks towards
answering this question. Traditionally, this has been done with carefully
engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is
no ``one size fits all'' approach that satisfies all requirements. In recent
years, the rising popularity of deep learning has resulted in a myriad of
end-to-end solutions to many computer vision problems. These approaches, while
successful, tend to lack scalability and can't easily exploit information
learned by other systems. Instead, we propose SAND features, a dedicated deep
learning solution to feature extraction capable of providing hierarchical
context information. This is achieved by employing sparse relative labels
indicating relationships of similarity/dissimilarity between image locations.
The nature of these labels results in an almost infinite set of dissimilar
examples to choose from. We demonstrate how the selection of negative examples
during training can be used to modify the feature space and vary it's
properties. To demonstrate the generality of this approach, we apply the
proposed features to a multitude of tasks, each requiring different properties.
This includes disparity estimation, semantic segmentation, self-localisation
and SLAM. In all cases, we show how incorporating SAND features results in
better or comparable results to the baseline, whilst requiring little to no
additional training. Code can be found at:
https://github.com/jspenmar/SAND_featuresComment: CVPR201
Neighbourhood Consensus Networks
We address the problem of finding reliable dense correspondences between a
pair of images. This is a challenging task due to strong appearance differences
between the corresponding scene elements and ambiguities generated by
repetitive patterns. The contributions of this work are threefold. First,
inspired by the classic idea of disambiguating feature matches using semi-local
constraints, we develop an end-to-end trainable convolutional neural network
architecture that identifies sets of spatially consistent matches by analyzing
neighbourhood consensus patterns in the 4D space of all possible
correspondences between a pair of images without the need for a global
geometric model. Second, we demonstrate that the model can be trained
effectively from weak supervision in the form of matching and non-matching
image pairs without the need for costly manual annotation of point to point
correspondences. Third, we show the proposed neighbourhood consensus network
can be applied to a range of matching tasks including both category- and
instance-level matching, obtaining the state-of-the-art results on the PF
Pascal dataset and the InLoc indoor visual localization benchmark.Comment: In Proceedings of the 32nd Conference on Neural Information
Processing Systems (NeurIPS 2018
Stereo image processing system for robot vision
More and more applications (path planning, collision avoidance
methods) require 3D description of the surround world. This paper
describes a stereo vision system that uses 2D (grayscale or color) images
to extract simple 2D geometric entities (points, lines) applying a
low-level feature detector. The features are matched across views with a
graph matching algorithm. During the projective reconstruction the 3D
description of the scene is recovered. The developed system uses uncalibrated
cameras, therefore only projective 3D structure can be detected
defined up to a collineation. Using the Euclidean information about a
known set of predefined objects stored in database and the results of the
recognition algorithm, the description can be updated to a metric one
- …