287 research outputs found
Local Descriptors Optimized for Average Precision
Extraction of local feature descriptors is a vital stage in the solution
pipelines for numerous computer vision tasks. Learning-based approaches improve
performance in certain tasks, but still cannot replace handcrafted features in
general. In this paper, we improve the learning of local feature descriptors by
optimizing the performance of descriptor matching, which is a common stage that
follows descriptor extraction in local feature based pipelines, and can be
formulated as nearest neighbor retrieval. Specifically, we directly optimize a
ranking-based retrieval performance metric, Average Precision, using deep
neural networks. This general-purpose solution can also be viewed as a listwise
learning to rank approach, which is advantageous compared to recent local
ranking approaches. On standard benchmarks, descriptors learned with our
formulation achieve state-of-the-art results in patch verification, patch
retrieval, and image matching.Comment: 13 pages, 8 figures. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Scanned surface to CAD design: matching, alignment and difference evaluation
This work proposes a tool to compare a high-precision surface scan with its original CAD model. The project is divided into three steps: how to process the CAD files, compute and optimize the registration, and develop tools for visualization. Because CAD files can contain multiple representations, we can't work directly with them. Normally this is approached by triangulating the described components and simplifying the mesh for a fast rendering, but this doesn't work for our high-density scans. Instead, we need to process the CAD to obtain a point cloud with a parameterized distance between points ---this will give a good starting point for the registration---. Next, the registration can be divided into two parts, coarse and fine. For the coarse register, we adapt the Initial Alignment Sample Consensus algorithm (IA RanSac) to automate the configuration settings and optimize the time for our input size. While in the fine register we will use the classic Iterative Closest Point (ICP). Due to the approach being a random consensus and the input being two big points cloud, reducing the number of points to a feasible number (statistically and computationally) will be essential to find a solution. For this, we developed a local optimizer that combines a set of LOD to find a global solution. Finally, to analyze the result, we have developed a color visualization interface with a set of modifier tools (colormaps, transparencies, range modifiers, etc.). This allows us to detect discrepancies between the two models that can be caused by wear or manufacturing imperfections
Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition
Given a cell phone image of a building we address the problem of place-of-interest recognition in urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-level image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a purely homothetic problem, which we show enables more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-level image data and a challenging set of cell phone image
Generic Primitive Detection in Point Clouds Using Novel Minimal Quadric Fits
We present a novel and effective method for detecting 3D primitives in
cluttered, unorganized point clouds, without axillary segmentation or type
specification. We consider the quadric surfaces for encapsulating the basic
building blocks of our environments - planes, spheres, ellipsoids, cones or
cylinders, in a unified fashion. Moreover, quadrics allow us to model higher
degree of freedom shapes, such as hyperboloids or paraboloids that could be
used in non-rigid settings.
We begin by contributing two novel quadric fits targeting 3D point sets that
are endowed with tangent space information. Based upon the idea of aligning the
quadric gradients with the surface normals, our first formulation is exact and
requires as low as four oriented points. The second fit approximates the first,
and reduces the computational effort. We theoretically analyze these fits with
rigor, and give algebraic and geometric arguments. Next, by re-parameterizing
the solution, we devise a new local Hough voting scheme on the null-space
coefficients that is combined with RANSAC, reducing the complexity from
to (three points). To the best of our knowledge, this is the
first method capable of performing a generic cross-type multi-object primitive
detection in difficult scenes without segmentation. Our extensive qualitative
and quantitative results show that our method is efficient and flexible, as
well as being accurate.Comment: Submitted to IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI). arXiv admin note: substantial text overlap with
arXiv:1803.0719
Learning to Predict Dense Correspondences for 6D Pose Estimation
Object pose estimation is an important problem in computer vision with applications in robotics, augmented reality and many other areas. An established strategy for object pose estimation consists of, firstly, finding correspondences between the image and the object’s reference frame, and, secondly, estimating the pose from outlier-free correspondences using Random Sample Consensus (RANSAC). The first step, namely finding correspondences, is difficult because object appearance varies depending on perspective, lighting and many other factors. Traditionally, correspondences have been established using handcrafted methods like sparse feature pipelines.
In this thesis, we introduce a dense correspondence representation for objects, called object coordinates, which can be learned. By learning object coordinates, our pose estimation pipeline adapts to various aspects of the task at hand. It works well for diverse object types, from small objects to entire rooms, varying object attributes, like textured or texture-less objects, and different input modalities, like RGB-D or RGB images. The concept of object coordinates allows us to easily model and exploit uncertainty as part of the pipeline such that even repeating structures or areas with little texture can contribute to a good solution. Although we can train object coordinate predictors independent of the full pipeline and achieve good results, training the pipeline in an end-to-end fashion is desirable. It enables the object coordinate predictor to adapt its output to the specificities of following steps in the pose estimation pipeline. Unfortunately, the RANSAC component of the pipeline is non-differentiable which prohibits end-to-end training. Adopting techniques from reinforcement learning, we introduce Differentiable Sample Consensus (DSAC), a formulation of RANSAC which allows us to train the pose estimation pipeline in an end-to-end fashion by minimizing the expectation of the final pose error
CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation
Beyond novel view synthesis, Neural Radiance Fields are useful for
applications that interact with the real world. In this paper, we use them as
an implicit map of a given scene and propose a camera relocalization algorithm
tailored for this representation. The proposed method enables to compute in
real-time the precise position of a device using a single RGB camera, during
its navigation. In contrast with previous work, we do not rely on pose
regression or photometric alignment but rather use dense local features
obtained through volumetric rendering which are specialized on the scene with a
self-supervised objective. As a result, our algorithm is more accurate than
competitors, able to operate in dynamic outdoor environments with changing
lightning conditions and can be readily integrated in any volumetric neural
renderer.Comment: Accepted to ICCV 202
Computer Vision without Vision : Methods and Applications of Radio and Audio Based SLAM
The central problem of this thesis is estimating receiver-sender node positions from measured receiver-sender distances or equivalent measurements. This problem arises in many applications such as microphone array calibration, radio antenna array calibration, mapping and positioning using ultra-wideband and mapping and positioning using round-trip-time measurements between mobile phones and Wi-Fi-units. Previous research has explored some of these problems, creating minimal solvers for instance, but these solutions lack real world implementation. Due to the nature of using different media, finding reliable receiver-sender distances is tough, with many of the measurements being erroneous or to a worse extent missing. Therefore in this thesis, we explore using minimal solvers to create robust solutions, that encompass small erroneous measurements and work around missing and grossly erroneous measurements.This thesis focuses mainly on Time-of-Arrival measurements using radio technologies such as Two-way-Ranging in Ultra-Wideband and a new IEEE standard 802.11mc found on many WiFi modules. The methods investigated, also related to Computer Vision problems such as Stucture-from-Motion. As part of this thesis, a range of new commercial radio technologies are characterised in terms of ranging in real world enviroments. In doing so, we have shown how these technologies can be used as a more accurate alternative to the Global Positioning System in indoor enviroments. Further to these solutions, more methods are proposed for large scale problems when multiple users will collect the data, commonly known as Big Data. For these cases, more data is not always better, so a method is proposed to try find the relevant data to calibrate large systems
- …