1,749 research outputs found

    Mapping, Localization and Path Planning for Image-based Navigation using Visual Features and Map

    Full text link
    Building on progress in feature representations for image retrieval, image-based localization has seen a surge of research interest. Image-based localization has the advantage of being inexpensive and efficient, often avoiding the use of 3D metric maps altogether. That said, the need to maintain a large number of reference images as an effective support of localization in a scene, nonetheless calls for them to be organized in a map structure of some kind. The problem of localization often arises as part of a navigation process. We are, therefore, interested in summarizing the reference images as a set of landmarks, which meet the requirements for image-based navigation. A contribution of this paper is to formulate such a set of requirements for the two sub-tasks involved: map construction and self-localization. These requirements are then exploited for compact map representation and accurate self-localization, using the framework of a network flow problem. During this process, we formulate the map construction and self-localization problems as convex quadratic and second-order cone programs, respectively. We evaluate our methods on publicly available indoor and outdoor datasets, where they outperform existing methods significantly.Comment: CVPR 2019, for implementation see https://github.com/janinethom

    Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization

    Full text link
    Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations

    Learning View-Model Joint Relevance for 3D Object Retrieval

    Get PDF
    3D object retrieval has attracted extensive research efforts and become an important task in recent years. It is noted that how to measure the relevance between 3D objects is still a difficult issue. Most of the existing methods employ just the model-based or view-based approaches, which may lead to incomplete information for 3D object representation. In this paper, we propose to jointly learn the view-model relevance among 3D objects for retrieval, in which the 3D objects are formulated in different graph structures. With the view information, the multiple views of 3D objects are employed to formulate the 3D object relationship in an object hypergraph structure. With the model data, the model-based features are extracted to construct an object graph to describe the relationship among the 3D objects. The learning on the two graphs is conducted to estimate the relevance among the 3D objects, in which the view/model graph weights can be also optimized in the learning process. This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graph-based framework. The proposed method has been evaluated in three data sets. The experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness on retrieval accuracy of the proposed 3D object retrieval method

    Video Registration in Egocentric Vision under Day and Night Illumination Changes

    Full text link
    With the spread of wearable devices and head mounted cameras, a wide range of application requiring precise user localization is now possible. In this paper we propose to treat the problem of obtaining the user position with respect to a known environment as a video registration problem. Video registration, i.e. the task of aligning an input video sequence to a pre-built 3D model, relies on a matching process of local keypoints extracted on the query sequence to a 3D point cloud. The overall registration performance is strictly tied to the actual quality of this 2D-3D matching, and can degrade if environmental conditions such as steep changes in lighting like the ones between day and night occur. To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process. To overcome the shortcomings of standard matching techniques, we introduce a novel embedding space that allows us to obtain robust matches by jointly taking into account local descriptors, their spatial arrangement and their temporal robustness. The proposal is evaluated using unconstrained egocentric video sequences both in terms of matching quality and resulting registration performance using different 3D models of historical landmarks. The results show that the proposed method can outperform state of the art registration algorithms, in particular when dealing with the challenges of night and day sequences

    DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image

    Full text link
    3D reconstruction from a single image is a key problem in multiple applications ranging from robotic manipulation to augmented reality. Prior methods have tackled this problem through generative models which predict 3D reconstructions as voxels or point clouds. However, these methods can be computationally expensive and miss fine details. We introduce a new differentiable layer for 3D data deformation and use it in DeformNet to learn a model for 3D reconstruction-through-deformation. DeformNet takes an image input, searches the nearest shape template from a database, and deforms the template to match the query image. We evaluate our approach on the ShapeNet dataset and show that - (a) the Free-Form Deformation layer is a powerful new building block for Deep Learning models that manipulate 3D data (b) DeformNet uses this FFD layer combined with shape retrieval for smooth and detail-preserving 3D reconstruction of qualitatively plausible point clouds with respect to a single query image (c) compared to other state-of-the-art 3D reconstruction methods, DeformNet quantitatively matches or outperforms their benchmarks by significant margins. For more information, visit: https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP

    On Rearrangement of Items Stored in Stacks

    Full text link
    There are n2n \ge 2 stacks, each filled with dd items, and one empty stack. Every stack has capacity d>0d > 0. A robot arm, in one stack operation (step), may pop one item from the top of a non-empty stack and subsequently push it onto a stack not at capacity. In a {\em labeled} problem, all ndnd items are distinguishable and are initially randomly scattered in the nn stacks. The items must be rearranged using pop-and-pushs so that in the end, the kthk^{\rm th} stack holds items (k1)d+1,,kd(k-1)d +1, \ldots, kd, in that order, from the top to the bottom for all 1kn1 \le k \le n. In an {\em unlabeled} problem, the ndnd items are of nn types of dd each. The goal is to rearrange items so that items of type kk are located in the kthk^{\rm th} stack for all 1kn1 \le k \le n. In carrying out the rearrangement, a natural question is to find the least number of required pop-and-pushes. Our main contributions are: (1) an algorithm for restoring the order of n2n^2 items stored in an n×nn \times n table using only 2n2n column and row permutations, and its generalization, and (2) an algorithm with a guaranteed upper bound of O(nd)O(nd) steps for solving both versions of the stack rearrangement problem when dcnd \le \lceil cn \rceil for arbitrary fixed positive number cc. In terms of the required number of steps, the labeled and unlabeled version have lower bounds Ω(nd+ndlogdlogn)\Omega(nd + nd{\frac{\log d}{\log n}}) and Ω(nd)\Omega(nd), respectively

    Survey of Object Detection Methods in Camouflaged Image

    Get PDF
    Camouflage is an attempt to conceal the signature of a target object into the background image. Camouflage detection methods or Decamouflaging method is basically used to detect foreground object hidden in the background image. In this research paper authors presented survey of camouflage detection methods for different applications and areas
    corecore