2,353 research outputs found
A comparative evaluation of interest point detectors and local descriptors for visual SLAM
Abstract In this paper we compare the behavior of different interest points detectors and descriptors under the
conditions needed to be used as landmarks in vision-based simultaneous localization and mapping (SLAM).
We evaluate the repeatability of the detectors, as well as the invariance and distinctiveness of the descriptors,
under different perceptual conditions using sequences of images representing planar objects as well as 3D scenes.
We believe that this information will be useful when selecting an appropriat
From handcrafted to deep local features
This paper presents an overview of the evolution of local features from
handcrafted to deep-learning-based methods, followed by a discussion of several
benchmarks and papers evaluating such local features. Our investigations are
motivated by 3D reconstruction problems, where the precise location of the
features is important. As we describe these methods, we highlight and explain
the challenges of feature extraction and potential ways to overcome them. We
first present handcrafted methods, followed by methods based on classical
machine learning and finally we discuss methods based on deep-learning. This
largely chronologically-ordered presentation will help the reader to fully
understand the topic of image and region description in order to make best use
of it in modern computer vision applications. In particular, understanding
handcrafted methods and their motivation can help to understand modern
approaches and how machine learning is used to improve the results. We also
provide references to most of the relevant literature and code.Comment: Preprin
Local Jet Pattern: A Robust Descriptor for Texture Classification
Methods based on local image features have recently shown promise for texture
classification tasks, especially in the presence of large intra-class variation
due to illumination, scale, and viewpoint changes. Inspired by the theories of
image structure analysis, this paper presents a simple, efficient, yet robust
descriptor namely local jet pattern (LJP) for texture classification. In this
approach, a jet space representation of a texture image is derived from a set
of derivatives of Gaussian (DtGs) filter responses up to second order, so
called local jet vectors (LJV), which also satisfy the Scale Space properties.
The LJP is obtained by utilizing the relationship of center pixel with the
local neighborhood information in jet space. Finally, the feature vector of a
texture region is formed by concatenating the histogram of LJP for all elements
of LJV. All DtGs responses up to second order together preserves the intrinsic
local image structure, and achieves invariance to scale, rotation, and
reflection. This allows us to develop a texture classification framework which
is discriminative and robust. Extensive experiments on five standard texture
image databases, employing nearest subspace classifier (NSC), the proposed
descriptor achieves 100%, 99.92%, 99.75%, 99.16%, and 99.65% accuracy for
Outex_TC-00010 (Outex_TC10), and Outex_TC-00012 (Outex_TC12), KTH-TIPS,
Brodatz, CUReT, respectively, which are outperforms the state-of-the-art
methods.Comment: Accepted in Multimedia Tools and Applications, Springe
Learning Spread-out Local Feature Descriptors
We propose a simple, yet powerful regularization technique that can be used
to significantly improve both the pairwise and triplet losses in learning local
feature descriptors. The idea is that in order to fully utilize the expressive
power of the descriptor space, good local feature descriptors should be
sufficiently "spread-out" over the space. In this work, we propose a
regularization term to maximize the spread in feature descriptor inspired by
the property of uniform distribution. We show that the proposed regularization
with triplet loss outperforms existing Euclidean distance based descriptor
learning techniques by a large margin. As an extension, the proposed
regularization technique can also be used to improve image-level deep feature
embedding.Comment: ICCV 2017. 9 pages, 7 figure
A Review of Codebook Models in Patch-Based Visual Object Recognition
The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods
SOSNet: Second Order Similarity Regularization for Local Descriptor Learning
Despite the fact that Second Order Similarity (SOS) has been used with
significant success in tasks such as graph matching and clustering, it has not
been exploited for learning local descriptors. In this work, we explore the
potential of SOS in the field of descriptor learning by building upon the
intuition that a positive pair of matching points should exhibit similar
distances with respect to other points in the embedding space. Thus, we propose
a novel regularization term, named Second Order Similarity Regularization
(SOSR), that follows this principle. By incorporating SOSR into training, our
learned descriptor achieves state-of-the-art performance on several challenging
benchmarks containing distinct tasks ranging from local patch retrieval to
structure from motion. Furthermore, by designing a von Mises-Fischer
distribution based evaluation method, we link the utilization of the descriptor
space to the matching performance, thus demonstrating the effectiveness of our
proposed SOSR. Extensive experimental results, empirical evidence, and in-depth
analysis are provided, indicating that SOSR can significantly boost the
matching performance of the learned descriptor
Appearance Descriptors for Person Re-identification: a Comprehensive Review
In video-surveillance, person re-identification is the task of recognising
whether an individual has already been observed over a network of cameras.
Typically, this is achieved by exploiting the clothing appearance, as classical
biometric traits like the face are impractical in real-world video surveillance
scenarios. Clothing appearance is represented by means of low-level
\textit{local} and/or \textit{global} features of the image, usually extracted
according to some part-based body model to treat different body parts (e.g.
torso and legs) independently. This paper provides a comprehensive review of
current approaches to build appearance descriptors for person
re-identification. The most relevant techniques are described in detail, and
categorised according to the body models and features used. The aim of this
work is to provide a structured body of knowledge and a starting point for
researchers willing to conduct novel investigations on this challenging topic
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification
Person re-identification is generally divided into two part: first how to
represent a pedestrian by discriminative visual descriptors and second how to
compare them by suitable distance metrics. Conventional methods isolate these
two parts, the first part usually unsupervised and the second part supervised.
The Bag-of-Words (BoW) model is a widely used image representing descriptor in
part one. Its codebook is simply generated by clustering visual features in
Euclidian space. In this paper, we propose to use part two metric learning
techniques in the codebook generation phase of BoW. In particular, the proposed
codebook is clustered under Mahalanobis distance which is learned supervised.
Extensive experiments prove that our proposed method is effective. With several
low level features extracted on superpixel and fused together, our method
outperforms state-of-the-art on person re-identification benchmarks including
VIPeR, PRID450S, and Market1501
A Review on Near Duplicate Detection of Images using Computer Vision Techniques
Nowadays, digital content is widespread and simply redistributable, either
lawfully or unlawfully. For example, after images are posted on the internet,
other web users can modify them and then repost their versions, thereby
generating near-duplicate images. The presence of near-duplicates affects the
performance of the search engines critically. Computer vision is concerned with
the automatic extraction, analysis and understanding of useful information from
digital images. The main application of computer vision is image understanding.
There are several tasks in image understanding such as feature extraction,
object detection, object recognition, image cleaning, image transformation,
etc. There is no proper survey in literature related to near duplicate
detection of images. In this paper, we review the state-of-the-art computer
vision-based approaches and feature extraction methods for the detection of
near duplicate images. We also discuss the main challenges in this field and
how other researchers addressed those challenges. This review provides research
directions to the fellow researchers who are interested to work in this field.Comment: 37 Pages, 7 figures, "For online first version, see
https://link.springer.com/article/10.1007/s11831-020-09400-w
- …