1,167 research outputs found

    Representation learning for street-view and aerial image retrieval

    Get PDF

    Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation

    Full text link
    Remote sensing (RS) image retrieval is of great significant for geological information mining. Over the past two decades, a large amount of research on this task has been carried out, which mainly focuses on the following three core issues: feature extraction, similarity metric and relevance feedback. Due to the complexity and multiformity of ground objects in high-resolution remote sensing (HRRS) images, there is still room for improvement in the current retrieval approaches. In this paper, we analyze the three core issues of RS image retrieval and provide a comprehensive review on existing methods. Furthermore, for the goal to advance the state-of-the-art in HRRS image retrieval, we focus on the feature extraction issue and delve how to use powerful deep representations to address this task. We conduct systematic investigation on evaluating correlative factors that may affect the performance of deep features. By optimizing each factor, we acquire remarkable retrieval results on publicly available HRRS datasets. Finally, we explain the experimental phenomenon in detail and draw conclusions according to our analysis. Our work can serve as a guiding role for the research of content-based RS image retrieval

    Content-based image retrieval by ensembles of deep learning object classifiers.

    Get PDF
    Copyright Owner. Versión definitiva disponible en el DOI indicado. Hamreras, S., Boucheham, B., Molina-Cabello, M. A., Benitez-Rochel, R., & Lopez-Rubio, E. (2020). Content based image retrieval by ensembles of deep learning object classifiers. Integrated computer-aided engineering, 27(3), 317-331.Ensemble learning has demonstrated its efficiency in many computer vision tasks. In this paper, we address this paradigm within content based image retrieval (CBIR). We propose to build an ensemble of convolutional neural networks (CNNs), either by training the CNNs on different bags of images, or by using CNNs trained on the same dataset, but having different architectures. Each network is used to extract the class probability vectors from images to use them as representations. The final image representation is then generated by combining the extracted class probability vectors from the built ensemble. We show that the use of CNN ensembles is very efficient in generating a powerful image representation compared to individual CNNs. Moreover, we propose an Averarge Query Expansion technique for our proposal to enhance the retrieval results. Several experiments were conducted to extensively evaluate the application of ensemble learning in CBIR. Results in terms of precision, recall, and mean average precision show the outperformance of our proposal compared to the state of the art

    Optimization of Historic Buildings Recognition: CNN Model and Supported by Pre-processing Methods

    Get PDF
    Several cities in Indonesia, such as Cirebon, Bandung, and Bogor, have several historical buildings that date back to the Dutch colonial period. Several Dutch colonial heritage buildings can be found in several areas. The existence of historical buildings also would attract foreign or local tourists who visit one of an area. We need a technology or model that would support the recognition and identification of buildings, including their characteristics. However, recognizing and identifying them is a problem in itself, so technology would be needed to help them. The technology or model that would be implemented in this research is the Convolutional Neural Network model, a derivative of Artificial Intelligent technology focused on image processing and pattern recognition. This process consists of several stages. The initial stage uses the Gaussian Blur, SuCK, and CLAHE methods which are useful for image sharpening and recognition. The second process is feature extraction of the image characteristics of buildings. The results of the image process will support the third process, namely the image retrieval process of buildings based on their characteristics. Based on these three main processes, they would facilitate and support local and foreign tourists to recognize historic buildings in the area. In this experiment, the Euclidean distance and Manhattan distance methods were used in the retrieval process. The highest accuracy in the retrieval process for the feature extraction process with the DenseNet 121 model with the initial process is Gaussian Blur of 88.96% and 88.46%, with the SuCK method of 88.3 and 87.8%, and with CLAHE of 87.7%, and 87.6%. We hope that this research can be continued to identify buildings with more complex characteristics and models

    Deep Hashing Based on Class-Discriminated Neighborhood Embedding

    Get PDF
    Deep-hashing methods have drawn significant attention during the past years in the field of remote sensing (RS) owing to their prominent capabilities for capturing the semantics from complex RS scenes and generating the associated hash codes in an end-to-end manner. Most existing deep-hashing methods exploit pairwise and triplet losses to learn the hash codes with the preservation of semantic-similarities which require the construction of image pairs and triplets based on supervised information (e.g., class labels). However, the learned Hamming spaces based on these losses may not be optimal due to an insufficient sampling of image pairs and triplets for scalable RS archives. To solve this limitation, we propose a new deep-hashing technique based on the class-discriminated neighborhood embedding, which can properly capture the locality structures among the RS scenes and distinguish images class-wisely in the Hamming space. An extensive experimentation has been conducted in order to validate the effectiveness of the proposed method by comparing it with several state-of-the-art conventional and deep-hashing methods. The related codes of this article will be made publicly available for reproducible research by the community

    SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval

    Get PDF
    This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods
    corecore