1,749 research outputs found
Mapping, Localization and Path Planning for Image-based Navigation using Visual Features and Map
Building on progress in feature representations for image retrieval,
image-based localization has seen a surge of research interest. Image-based
localization has the advantage of being inexpensive and efficient, often
avoiding the use of 3D metric maps altogether. That said, the need to maintain
a large number of reference images as an effective support of localization in a
scene, nonetheless calls for them to be organized in a map structure of some
kind.
The problem of localization often arises as part of a navigation process. We
are, therefore, interested in summarizing the reference images as a set of
landmarks, which meet the requirements for image-based navigation. A
contribution of this paper is to formulate such a set of requirements for the
two sub-tasks involved: map construction and self-localization. These
requirements are then exploited for compact map representation and accurate
self-localization, using the framework of a network flow problem. During this
process, we formulate the map construction and self-localization problems as
convex quadratic and second-order cone programs, respectively. We evaluate our
methods on publicly available indoor and outdoor datasets, where they
outperform existing methods significantly.Comment: CVPR 2019, for implementation see https://github.com/janinethom
Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
Many robotics applications require precise pose estimates despite operating
in large and changing environments. This can be addressed by visual
localization, using a pre-computed 3D model of the surroundings. The pose
estimation then amounts to finding correspondences between 2D keypoints in a
query image and 3D points in the model using local descriptors. However,
computational power is often limited on robotic platforms, making this task
challenging in large-scale environments. Binary feature descriptors
significantly speed up this 2D-3D matching, and have become popular in the
robotics community, but also strongly impair the robustness to perceptual
aliasing and changes in viewpoint, illumination and scene structure. In this
work, we propose to leverage recent advances in deep learning to perform an
efficient hierarchical localization. We first localize at the map level using
learned image-wide global descriptors, and subsequently estimate a precise pose
from 2D-3D matches computed in the candidate places only. This restricts the
local search and thus allows to efficiently exploit powerful non-binary
descriptors usually dismissed on resource-constrained devices. Our approach
results in state-of-the-art localization performance while running in real-time
on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations
Learning View-Model Joint Relevance for 3D Object Retrieval
3D object retrieval has attracted extensive research efforts and become an important task in recent years. It is noted that how to measure the relevance between 3D objects is still a difficult issue. Most of the existing methods employ just the model-based or view-based approaches, which may lead to incomplete information for 3D object representation. In this paper, we propose to jointly learn the view-model relevance among 3D objects for retrieval, in which the 3D objects are formulated in different graph structures. With the view information, the multiple views of 3D objects are employed to formulate the 3D object relationship in an object hypergraph structure. With the model data, the model-based features are extracted to construct an object graph to describe the relationship among the 3D objects. The learning on the two graphs is conducted to estimate the relevance among the 3D objects, in which the view/model graph weights can be also optimized in the learning process. This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graph-based framework. The proposed method has been evaluated in three data sets. The experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness on retrieval accuracy of the proposed 3D object retrieval method
Video Registration in Egocentric Vision under Day and Night Illumination Changes
With the spread of wearable devices and head mounted cameras, a wide range of
application requiring precise user localization is now possible. In this paper
we propose to treat the problem of obtaining the user position with respect to
a known environment as a video registration problem. Video registration, i.e.
the task of aligning an input video sequence to a pre-built 3D model, relies on
a matching process of local keypoints extracted on the query sequence to a 3D
point cloud. The overall registration performance is strictly tied to the
actual quality of this 2D-3D matching, and can degrade if environmental
conditions such as steep changes in lighting like the ones between day and
night occur. To effectively register an egocentric video sequence under these
conditions, we propose to tackle the source of the problem: the matching
process. To overcome the shortcomings of standard matching techniques, we
introduce a novel embedding space that allows us to obtain robust matches by
jointly taking into account local descriptors, their spatial arrangement and
their temporal robustness. The proposal is evaluated using unconstrained
egocentric video sequences both in terms of matching quality and resulting
registration performance using different 3D models of historical landmarks. The
results show that the proposed method can outperform state of the art
registration algorithms, in particular when dealing with the challenges of
night and day sequences
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
3D reconstruction from a single image is a key problem in multiple
applications ranging from robotic manipulation to augmented reality. Prior
methods have tackled this problem through generative models which predict 3D
reconstructions as voxels or point clouds. However, these methods can be
computationally expensive and miss fine details. We introduce a new
differentiable layer for 3D data deformation and use it in DeformNet to learn a
model for 3D reconstruction-through-deformation. DeformNet takes an image
input, searches the nearest shape template from a database, and deforms the
template to match the query image. We evaluate our approach on the ShapeNet
dataset and show that - (a) the Free-Form Deformation layer is a powerful new
building block for Deep Learning models that manipulate 3D data (b) DeformNet
uses this FFD layer combined with shape retrieval for smooth and
detail-preserving 3D reconstruction of qualitatively plausible point clouds
with respect to a single query image (c) compared to other state-of-the-art 3D
reconstruction methods, DeformNet quantitatively matches or outperforms their
benchmarks by significant margins. For more information, visit:
https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP
On Rearrangement of Items Stored in Stacks
There are stacks, each filled with items, and one empty stack.
Every stack has capacity . A robot arm, in one stack operation (step),
may pop one item from the top of a non-empty stack and subsequently push it
onto a stack not at capacity. In a {\em labeled} problem, all items are
distinguishable and are initially randomly scattered in the stacks. The
items must be rearranged using pop-and-pushs so that in the end, the stack holds items , in that order, from the top to
the bottom for all . In an {\em unlabeled} problem, the
items are of types of each. The goal is to rearrange items so that
items of type are located in the stack for all . In carrying out the rearrangement, a natural question is to find the least
number of required pop-and-pushes.
Our main contributions are: (1) an algorithm for restoring the order of
items stored in an table using only column and row
permutations, and its generalization, and (2) an algorithm with a guaranteed
upper bound of steps for solving both versions of the stack
rearrangement problem when for arbitrary fixed
positive number . In terms of the required number of steps, the labeled and
unlabeled version have lower bounds
and , respectively
Survey of Object Detection Methods in Camouflaged Image
Camouflage is an attempt to conceal the signature of a target object into the background image. Camouflage detection
methods or Decamouflaging method is basically used to detect foreground object hidden in the background image. In this
research paper authors presented survey of camouflage detection methods for different applications and areas
- …