1,167 research outputs found
Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation
Remote sensing (RS) image retrieval is of great significant for geological
information mining. Over the past two decades, a large amount of research on
this task has been carried out, which mainly focuses on the following three
core issues: feature extraction, similarity metric and relevance feedback. Due
to the complexity and multiformity of ground objects in high-resolution remote
sensing (HRRS) images, there is still room for improvement in the current
retrieval approaches. In this paper, we analyze the three core issues of RS
image retrieval and provide a comprehensive review on existing methods.
Furthermore, for the goal to advance the state-of-the-art in HRRS image
retrieval, we focus on the feature extraction issue and delve how to use
powerful deep representations to address this task. We conduct systematic
investigation on evaluating correlative factors that may affect the performance
of deep features. By optimizing each factor, we acquire remarkable retrieval
results on publicly available HRRS datasets. Finally, we explain the
experimental phenomenon in detail and draw conclusions according to our
analysis. Our work can serve as a guiding role for the research of
content-based RS image retrieval
Content-based image retrieval by ensembles of deep learning object classifiers.
Copyright Owner.
Versión definitiva disponible en el DOI indicado.
Hamreras, S., Boucheham, B., Molina-Cabello, M. A., Benitez-Rochel, R., & Lopez-Rubio, E. (2020). Content based image retrieval by ensembles of deep learning object classifiers. Integrated computer-aided engineering, 27(3), 317-331.Ensemble learning has demonstrated its efficiency in many computer vision tasks. In this paper, we address this paradigm within content based image retrieval (CBIR). We propose to build an ensemble of convolutional neural networks (CNNs), either by training the CNNs on different bags of images, or by using CNNs trained on the same dataset, but having different architectures. Each network is used to extract the class probability vectors from images to use them as representations. The final image representation is then generated by combining the extracted class probability vectors from the built ensemble. We show that the use of CNN ensembles is very efficient in generating a powerful image representation compared to individual CNNs. Moreover, we propose an Averarge Query Expansion technique for our proposal to enhance the retrieval results. Several experiments were conducted to extensively evaluate the application of ensemble learning in CBIR. Results in terms of precision, recall, and mean average precision show the outperformance of our proposal compared to the state of the art
Optimization of Historic Buildings Recognition: CNN Model and Supported by Pre-processing Methods
Several cities in Indonesia, such as Cirebon, Bandung, and Bogor, have several historical buildings that date back to the Dutch colonial period. Several Dutch colonial heritage buildings can be found in several areas. The existence of historical buildings also would attract foreign or local tourists who visit one of an area. We need a technology or model that would support the recognition and identification of buildings, including their characteristics. However, recognizing and identifying them is a problem in itself, so technology would be needed to help them. The technology or model that would be implemented in this research is the Convolutional Neural Network model, a derivative of Artificial Intelligent technology focused on image processing and pattern recognition. This process consists of several stages. The initial stage uses the Gaussian Blur, SuCK, and CLAHE methods which are useful for image sharpening and recognition. The second process is feature extraction of the image characteristics of buildings. The results of the image process will support the third process, namely the image retrieval process of buildings based on their characteristics. Based on these three main processes, they would facilitate and support local and foreign tourists to recognize historic buildings in the area. In this experiment, the Euclidean distance and Manhattan distance methods were used in the retrieval process. The highest accuracy in the retrieval process for the feature extraction process with the DenseNet 121 model with the initial process is Gaussian Blur of 88.96% and 88.46%, with the SuCK method of 88.3 and 87.8%, and with CLAHE of 87.7%, and 87.6%. We hope that this research can be continued to identify buildings with more complex characteristics and models
Deep Hashing Based on Class-Discriminated Neighborhood Embedding
Deep-hashing methods have drawn significant attention during the past years in the field of remote sensing (RS)
owing to their prominent capabilities for capturing the semantics
from complex RS scenes and generating the associated hash codes
in an end-to-end manner. Most existing deep-hashing methods
exploit pairwise and triplet losses to learn the hash codes with
the preservation of semantic-similarities which require the construction of image pairs and triplets based on supervised information (e.g., class labels). However, the learned Hamming spaces
based on these losses may not be optimal due to an insufficient
sampling of image pairs and triplets for scalable RS archives. To
solve this limitation, we propose a new deep-hashing technique
based on the class-discriminated neighborhood embedding, which
can properly capture the locality structures among the RS scenes
and distinguish images class-wisely in the Hamming space. An
extensive experimentation has been conducted in order to validate
the effectiveness of the proposed method by comparing it with
several state-of-the-art conventional and deep-hashing methods.
The related codes of this article will be made publicly available for
reproducible research by the community
SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods
- …