5,947 research outputs found

    Aggregated Deep Local Features for Remote Sensing Image Retrieval

    Get PDF
    Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing Imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g. 50% faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal contributio

    ProSLAM: Graph SLAM from a Programmer's Perspective

    Full text link
    In this paper we present ProSLAM, a lightweight stereo visual SLAM system designed with simplicity in mind. Our work stems from the experience gathered by the authors while teaching SLAM to students and aims at providing a highly modular system that can be easily implemented and understood. Rather than focusing on the well known mathematical aspects of Stereo Visual SLAM, in this work we highlight the data structures and the algorithmic aspects that one needs to tackle during the design of such a system. We implemented ProSLAM using the C++ programming language in combination with a minimal set of well known used external libraries. In addition to an open source implementation, we provide several code snippets that address the core aspects of our approach directly in this paper. The results of a thorough validation performed on standard benchmark datasets show that our approach achieves accuracy comparable to state of the art methods, while requiring substantially less computational resources.Comment: 8 pages, 8 figure

    Automated annotation of landmark images using community contributed datasets and web resources

    Get PDF
    A novel solution to the challenge of automatic image annotation is described. Given an image with GPS data of its location of capture, our system returns a semantically-rich annotation comprising tags which both identify the landmark in the image, and provide an interesting fact about it, e.g. "A view of the Eiffel Tower, which was built in 1889 for an international exhibition in Paris". This exploits visual and textual web mining in combination with content-based image analysis and natural language processing. In the first stage, an input image is matched to a set of community contributed images (with keyword tags) on the basis of its GPS information and image classification techniques. The depicted landmark is inferred from the keyword tags for the matched set. The system then takes advantage of the information written about landmarks available on the web at large to extract a fact about the landmark in the image. We report component evaluation results from an implementation of our solution on a mobile device. Image localisation and matching oers 93.6% classication accuracy; the selection of appropriate tags for use in annotation performs well (F1M of 0.59), and it subsequently automatically identies a correct toponym for use in captioning and fact extraction in 69.0% of the tested cases; finally the fact extraction returns an interesting caption in 78% of cases

    A comparative evaluation of interest point detectors and local descriptors for visual SLAM

    Get PDF
    Abstract In this paper we compare the behavior of different interest points detectors and descriptors under the conditions needed to be used as landmarks in vision-based simultaneous localization and mapping (SLAM). We evaluate the repeatability of the detectors, as well as the invariance and distinctiveness of the descriptors, under different perceptual conditions using sequences of images representing planar objects as well as 3D scenes. We believe that this information will be useful when selecting an appropriat
    corecore