6,232 research outputs found

    Compact Environment-Invariant Codes for Robust Visual Place Recognition

    Full text link
    Robust visual place recognition (VPR) requires scene representations that are invariant to various environmental challenges such as seasonal changes and variations due to ambient lighting conditions during day and night. Moreover, a practical VPR system necessitates compact representations of environmental features. To satisfy these requirements, in this paper we suggest a modification to the existing pipeline of VPR systems to incorporate supervised hashing. The modified system learns (in a supervised setting) compact binary codes from image feature descriptors. These binary codes imbibe robustness to the visual variations exposed to it during the training phase, thereby, making the system adaptive to severe environmental changes. Also, incorporating supervised hashing makes VPR computationally more efficient and easy to implement on simple hardware. This is because binary embeddings can be learned over simple-to-compute features and the distance computation is also in the low-dimensional hamming space of binary codes. We have performed experiments on several challenging data sets covering seasonal, illumination and viewpoint variations. We also compare two widely used supervised hashing methods of CCAITQ and MLH and show that this new pipeline out-performs or closely matches the state-of-the-art deep learning VPR methods that are based on high-dimensional features extracted from pre-trained deep convolutional neural networks.Comment: Conference on Computer and Robot Vision (CRV) 201

    Self-localization from Images with Small Overlap

    Full text link
    With the recent success of visual features from deep convolutional neural networks (DCNN) in visual robot self-localization, it has become important and practical to address more general self-localization scenarios. In this paper, we address the scenario of self-localization from images with small overlap. We explicitly introduce a localization difficulty index as a decreasing function of view overlap between query and relevant database images and investigate performance versus difficulty for challenging cross-view self-localization tasks. We then reformulate the self-localization as a scalable bag-of-visual-features (BoVF) scene retrieval and present an efficient solution called PCA-NBNN, aiming to facilitate fast and yet discriminative correspondence between partially overlapping images. The proposed approach adopts recent findings in discriminativity preserving encoding of DCNN features using principal component analysis (PCA) and cross-domain scene matching using naive Bayes nearest neighbor distance metric (NBNN). We experimentally demonstrate that the proposed PCA-NBNN framework frequently achieves comparable results to previous DCNN features and that the BoVF model is significantly more efficient. We further address an important alternative scenario of "self-localization from images with NO overlap" and report the result.Comment: 8 pages, 9 figures, Draft of a paper submitted to an International Conferenc

    Vision-based Pose Estimation for Augmented Reality : A Comparison Study

    Full text link
    Augmented reality aims to enrich our real world by inserting 3D virtual objects. In order to accomplish this goal, it is important that virtual elements are rendered and aligned in the real scene in an accurate and visually acceptable way. The solution of this problem can be related to a pose estimation and 3D camera localization. This paper presents a survey on different approaches of 3D pose estimation in augmented reality and gives classification of key-points-based techniques. The study given in this paper may help both developers and researchers in the field of augmented reality.Comment: IEEE International Conference on Pattern Analysis and Intelligent Systems PAIS'201

    Point-cloud-based place recognition using CNN feature extraction

    Full text link
    This paper proposes a novel point-cloud-based place recognition system that adopts a deep learning approach for feature extraction. By using a convolutional neural network pre-trained on color images to extract features from a range image without fine-tuning on extra range images, significant improvement has been observed when compared to using hand-crafted features. The resulting system is illumination invariant, rotation invariant and robust against moving objects that are unrelated to the place identity. Apart from the system itself, we also bring to the community a new place recognition dataset containing both point cloud and grayscale images covering a full 360∘360^\circ environmental view. In addition, the dataset is organized in such a way that it facilitates experimental validation with respect to rotation invariance or robustness against unrelated moving objects separately

    Scalable Change Retrieval Using Deep 3D Neural Codes

    Full text link
    We present a novel scalable framework for image change detection (ICD) from an on-board 3D imagery system. We argue that existing ICD systems are constrained by the time required to align a given query image with individual reference image coordinates. We utilize an invariant coordinate system (ICS) to replace the time-consuming image alignment with an offline pre-processing procedure. Our key contribution is an extension of the traditional image comparison-based ICD tasks to setups of the image retrieval (IR) task. We replace each component of the 3D ICD system, i.e., (1) image modeling, (2) image alignment, and (3) image differencing, with significantly efficient variants from the bag-of-words (BoW) IR paradigm. Further, we train a deep 3D feature extractor in an unsupervised manner using an unsupervised Siamese network and automatically collected training data. We conducted experiments on a challenging cross-season ICD task using a publicly available dataset and thereby validate the efficacy of the proposed approach.Comment: 5 pages, 1 figure, technical repor

    VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change

    Full text link
    Visual Place Recognition (VPR) is the process of recognising a previously visited place using visual information, often under varying appearance conditions and viewpoint changes and with computational constraints. VPR is a critical component of many autonomous navigation systems ranging from autonomous vehicles to drones. While the concept of place recognition has been around for many years, VPR research has grown rapidly as a field over the past decade due to both improving camera hardware technologies and its suitability for application of deep learning-based techniques. With this growth however has come field fragmentation, lack of standardisation and a disconnect between current performance metrics and the actual utility of a VPR technique at application-deployment. In this paper we address these key challenges through a new comprehensive open-source evaluation framework, dubbed 'VPR-Bench'. VPR-Bench introduces two much-needed capabilities for researchers: firstly, quantification of viewpoint and illumination variation, replacing what has largely been assessed qualitatively in the past, and secondly, new metrics 'Extended precision' (EP), 'Performance-Per-Compute-Unit' (PCU) and 'Number of Prospective Place Matching Candidates' (NPPMC). These new metrics complement the limitations of traditional Precision-Recall curves, by providing measures that are more informative to the wide range of potential VPR applications. Mechanistically, we develop new unified templates that facilitate the implementation, deployment and evaluation of a wide range of VPR techniques and datasets. We incorporate the most comprehensive combination of state-of-the-art VPR techniques and datasets to date into VPR-Bench and demonstrate how it provides a rich range of previously inaccessible insights, such as the nuanced relationship between viewpoint invariance, different types of VPR techniques and datasets.Comment: Currently under-review, 25 pages, 16 figure

    From BoW to CNN: Two Decades of Texture Representation for Texture Classification

    Full text link
    Texture is a fundamental characteristic of many types of images, and texture representation is one of the essential and challenging problems in computer vision and pattern recognition which has attracted extensive research attention. Since 2000, texture representations based on Bag of Words (BoW) and on Convolutional Neural Networks (CNNs) have been extensively studied with impressive performance. Given this period of remarkable evolution, this paper aims to present a comprehensive survey of advances in texture representation over the last two decades. More than 200 major publications are cited in this survey covering different aspects of the research, which includes (i) problem description; (ii) recent advances in the broad categories of BoW-based, CNN-based and attribute-based methods; and (iii) evaluation issues, specifically benchmark datasets and state of the art results. In retrospect of what has been achieved so far, the survey discusses open challenges and directions for future research.Comment: Accepted by IJC

    From handcrafted to deep local features

    Full text link
    This paper presents an overview of the evolution of local features from handcrafted to deep-learning-based methods, followed by a discussion of several benchmarks and papers evaluating such local features. Our investigations are motivated by 3D reconstruction problems, where the precise location of the features is important. As we describe these methods, we highlight and explain the challenges of feature extraction and potential ways to overcome them. We first present handcrafted methods, followed by methods based on classical machine learning and finally we discuss methods based on deep-learning. This largely chronologically-ordered presentation will help the reader to fully understand the topic of image and region description in order to make best use of it in modern computer vision applications. In particular, understanding handcrafted methods and their motivation can help to understand modern approaches and how machine learning is used to improve the results. We also provide references to most of the relevant literature and code.Comment: Preprin

    Large scale visual place recognition with sub-linear storage growth

    Full text link
    Robotic and animal mapping systems share many of the same objectives and challenges, but differ in one key aspect: where much of the research in robotic mapping has focused on solving the data association problem, the grid cell neurons underlying maps in the mammalian brain appear to intentionally break data association by encoding many locations with a single grid cell neuron. One potential benefit of this intentional aliasing is both sub-linear map storage and computational requirements growth with environment size, which we demonstrated in a previous proof-of-concept study that detected and encoded mutually complementary co-prime pattern frequencies in the visual map data. In this research, we solve several of the key theoretical and practical limitations of that prototype model and achieve significantly better sub-linear storage growth, a factor reduction in storage requirements per map location, scalability to large datasets on standard compute equipment and improved robustness to environments with visually challenging appearance change. These improvements are achieved through several innovations including a flexible user-driven choice mechanism for the periodic patterns underlying the new encoding method, a parallelized chunking technique that splits the map into sub-sections processed in parallel and a novel feature selection approach that selects only the image information most relevant to the encoded temporal patterns. We evaluate our techniques on two large benchmark datasets with the comparison to the previous state-of-the-art system, as well as providing a detailed analysis of system performance with respect to parameters such as required precision performance and the number of cyclic patterns encoded

    Place recognition survey: An update on deep learning approaches

    Full text link
    Autonomous Vehicles (AV) are becoming more capable of navigating in complex environments with dynamic and changing conditions. A key component that enables these intelligent vehicles to overcome such conditions and become more autonomous is the sophistication of the perception and localization systems. As part of the localization system, place recognition has benefited from recent developments in other perception tasks such as place categorization or object recognition, namely with the emergence of deep learning (DL) frameworks. This paper surveys recent approaches and methods used in place recognition, particularly those based on deep learning. The contributions of this work are twofold: surveying recent sensors such as 3D LiDARs and RADARs, applied in place recognition; and categorizing the various DL-based place recognition works into supervised, unsupervised, semi-supervised, parallel, and hierarchical categories. First, this survey introduces key place recognition concepts to contextualize the reader. Then, sensor characteristics are addressed. This survey proceeds by elaborating on the various DL-based works, presenting summaries for each framework. Some lessons learned from this survey include: the importance of NetVLAD for supervised end-to-end learning; the advantages of unsupervised approaches in place recognition, namely for cross-domain applications; or the increasing tendency of recent works to seek, not only for higher performance but also for higher efficiency.Comment: Under review in IEEE Transactions on Intelligent Vehicles. This work was submitted on the 13/01/2021 to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Upon acceptance of the article by IEEE, the preprint article will be replaced with the accepted versio
    • …
    corecore