5,827 research outputs found
Deep image representations for instance search
We address the problem of visual instance search, which consists to retrieve all
the images within an dataset that contain a particular visual example provided to
the system. The traditional approach of processing the image content for this task
relied on extracting local low-level information within images that was āmanually
engineeredā to be invariant to diāµerent image conditions. One of the most popular
approaches uses the Bag of Visual Words (BoW) model on the local features to
aggregate the local information into a single representation. Usually, a final reranking stage is included in the pipeline to refine the search results. Since the
emergence of deep learning as the dominant technique in computer vision in 2012,
much research attention has been focused on deriving image representations from
Convolutional Neural Networks (CNN) models for the task of instance search as a
ādata drivenā approach to designing image representations. However, one of the main
challenges in the instance search task is the lack of annotated datasets to fit CNN
models parameters.
This work explores the capabilities of descriptors derived from pre-trained CNN
models for image classification to address the task of instance retrieval. First, we
conduct an investigation of the traditional bag of visual words encoding on local
CNN features to produce a scalable image retrieval framework that generalizes well
across diāµerent retrieval domains. Second, we propose to improve the capacity of the
obtained representations by exploring an unsupervised fine-tuning strategy that allow
us to obtain better performing representations at the price of losing the generalization
of the representations. Finally, we propose using visual attention models to weight
the contribution of the relevant parts of an image to obtain a very powerful image
representation for instance retrieval without requiring the construction of a large
and suitable training dataset for fine-tuning CNN architectures
Aggregated Deep Local Features for Remote Sensing Image Retrieval
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal
contributio
Place recognition: An Overview of Vision Perspective
Place recognition is one of the most fundamental topics in computer vision
and robotics communities, where the task is to accurately and efficiently
recognize the location of a given query image. Despite years of wisdom
accumulated in this field, place recognition still remains an open problem due
to the various ways in which the appearance of real-world places may differ.
This paper presents an overview of the place recognition literature. Since
condition invariant and viewpoint invariant features are essential factors to
long-term robust visual place recognition system, We start with traditional
image description methodology developed in the past, which exploit techniques
from image retrieval field. Recently, the rapid advances of related fields such
as object detection and image classification have inspired a new technique to
improve visual place recognition system, i.e., convolutional neural networks
(CNNs). Thus we then introduce recent progress of visual place recognition
system based on CNNs to automatically learn better image representations for
places. Eventually, we close with discussions and future work of place
recognition.Comment: Applied Sciences (2018
Image Reconstruction from Bag-of-Visual-Words
The objective of this work is to reconstruct an original image from
Bag-of-Visual-Words (BoVW). Image reconstruction from features can be a means
of identifying the characteristics of features. Additionally, it enables us to
generate novel images via features. Although BoVW is the de facto standard
feature for image recognition and retrieval, successful image reconstruction
from BoVW has not been reported yet. What complicates this task is that BoVW
lacks the spatial information for including visual words. As described in this
paper, to estimate an original arrangement, we propose an evaluation function
that incorporates the naturalness of local adjacency and the global position,
with a method to obtain related parameters using an external image database. To
evaluate the performance of our method, we reconstruct images of objects of 101
kinds. Additionally, we apply our method to analyze object classifiers and to
generate novel images via BoVW
- ā¦