9,198 research outputs found
Local Color Contrastive Descriptor for Image Classification
Image representation and classification are two fundamental tasks towards
multimedia content retrieval and understanding. The idea that shape and texture
information (e.g. edge or orientation) are the key features for visual
representation is ingrained and dominated in current multimedia and computer
vision communities. A number of low-level features have been proposed by
computing local gradients (e.g. SIFT, LBP and HOG), and have achieved great
successes on numerous multimedia applications. In this paper, we present a
simple yet efficient local descriptor for image classification, referred as
Local Color Contrastive Descriptor (LCCD), by leveraging the neural mechanisms
of color contrast. The idea originates from the observation in neural science
that color and shape information are linked inextricably in visual cortical
processing. The color contrast yields key information for visual color
perception and provides strong linkage between color and shape. We propose a
novel contrastive mechanism to compute the color contrast in both spatial
location and multiple channels. The color contrast is computed by measuring
\emph{f}-divergence between the color distributions of two regions. Our
descriptor enriches local image representation with both color and contrast
information. We verified experimentally that it can compensate strongly for the
shape based descriptor (e.g. SIFT), while keeping computationally simple.
Extensive experimental results on image classification show that our descriptor
improves the performance of SIFT substantially by combinations, and achieves
the state-of-the-art performance on three challenging benchmark datasets. It
improves recent Deep Learning model (DeCAF) [1] largely from the accuracy of
40.94% to 49.68% in the large scale SUN397 database. Codes for the LCCD will be
available
Robust Face Recognition with Structural Binary Gradient Patterns
This paper presents a computationally efficient yet powerful binary framework
for robust facial representation based on image gradients. It is termed as
structural binary gradient patterns (SBGP). To discover underlying local
structures in the gradient domain, we compute image gradients from multiple
directions and simplify them into a set of binary strings. The SBGP is derived
from certain types of these binary strings that have meaningful local
structures and are capable of resembling fundamental textural information. They
detect micro orientational edges and possess strong orientation and locality
capabilities, thus enabling great discrimination. The SBGP also benefits from
the advantages of the gradient domain and exhibits profound robustness against
illumination variations. The binary strategy realized by pixel correlations in
a small neighborhood substantially simplifies the computational complexity and
achieves extremely efficient processing with only 0.0032s in Matlab for a
typical face image. Furthermore, the discrimination power of the SBGP can be
enhanced on a set of defined orientational image gradient magnitudes, further
enforcing locality and orientation. Results of extensive experiments on various
benchmark databases illustrate significant improvements of the SBGP based
representations over the existing state-of-the-art local descriptors in the
terms of discrimination, robustness and complexity. Codes for the SBGP methods
will be available at
http://www.eee.manchester.ac.uk/research/groups/sisp/software/
Evaluation of Feature Detector-Descriptor for Real Object Matching under Various Conditions of Ilumination and Affine Transformation
This study attempts to provide explanations, descriptions and evaluations of
some most popular and current combinations of description and descriptor
frameworks, namely SIFT, SURF, MSER, and BRISK for keypoint extractors and
SIFT, SURF, BRISK, and FREAK for descriptors. Evaluations are made based on the
number of matches of keypoints and repeatability in various image variations.
It is used as the main parameter to assess how well combinations of algorithms
are in matching objects with different variations. There are many papers that
describe the comparison of detection and description features to detect objects
in images under various conditions, but the combination of algorithms attached
to them has not been much discussed. The problem domain is limited to different
illumination levels and affine transformations from different perspectives. To
evaluate the robustness of all combinations of algorithms, we use a stereo
image matching case
Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce an end-to-end framework for video analysis
focused towards practical scenarios built on theoretical foundations from
sparse representation, including a novel descriptor for general purpose video
analysis. In our approach, we compute kinematic features from optical flow and
first and second-order derivatives of intensities to represent motion and
appearance respectively. These features are then used to construct covariance
matrices which capture joint statistics of both low-level motion and appearance
features extracted from a video. Using an over-complete dictionary of the
covariance based descriptors built from labeled training samples, we formulate
low-level event recognition as a sparse linear approximation problem. Within
this, we pose the sparse decomposition of a covariance matrix, which also
conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear
Riemannian manifolds, we compare our former approach with a sparse linear
approximation alternative that is suitable for equivalent vector spaces of
covariance matrices. This is done by searching for the best projection of the
query data on a dictionary using an Orthogonal Matching pursuit algorithm. We
show the applicability of our video descriptor in two different application
domains - namely low-level event recognition in unconstrained scenarios and
gesture recognition using one shot learning. Our experiments provide promising
insights in large scale video analysis
WxBS: Wide Baseline Stereo Generalizations
We have presented a new problem -- the wide multiple baseline stereo (WxBS)
-- which considers matching of images that simultaneously differ in more than
one image acquisition factor such as viewpoint, illumination, sensor type or
where object appearance changes significantly, e.g. over time. A new dataset
with the ground truth for evaluation of matching algorithms has been introduced
and will be made public.
We have extensively tested a large set of popular and recent detectors and
descriptors and show than the combination of RootSIFT and HalfRootSIFT as
descriptors with MSER and Hessian-Affine detectors works best for many
different nuisance factors. We show that simple adaptive thresholding improves
Hessian-Affine, DoG, MSER (and possibly other) detectors and allows to use them
on infrared and low contrast images.
A novel matching algorithm for addressing the WxBS problem has been
introduced. We have shown experimentally that the WxBS-M matcher dominantes the
state-of-the-art methods both on both the new and existing datasets.Comment: Descriptor and detector evaluation expande
Learning Whole-Image Descriptors for Real-time Loop Detection andKidnap Recovery under Large Viewpoint Difference
We present a real-time stereo visual-inertial-SLAM system which is able to
recover from complicatedkidnap scenarios and failures online in realtime. We
propose to learn the whole-image-descriptorin a weakly supervised manner based
on NetVLAD and decoupled convolutions. We analyse thetraining difficulties in
using standard loss formulations and propose an allpairloss and show itseffect
through extensive experiments. Compared to standard NetVLAD, our network takes
an orderof magnitude fewer computations and model parameters, as a result runs
about three times faster.We evaluate the representation power of our descriptor
on standard datasets with precision-recall.Unlike previous loop detection
methods which have been evaluated only on fronto-parallel revisits,we evaluate
the performace of our method with competing methods on scenarios involving
largeviewpoint difference. Finally, we present the fully functional system with
relative computation andhandling of multiple world co-ordinate system which is
able to reduce odometry drift, recover fromcomplicated kidnap scenarios and
random odometry failures. We open source our fully functional system as an
add-on for the popular VINS-Fusion
Exploiting SIFT Descriptor for Rotation Invariant Convolutional Neural Network
This paper presents a novel approach to exploit the distinctive invariant
features in convolutional neural network. The proposed CNN model uses Scale
Invariant Feature Transform (SIFT) descriptor instead of the max-pooling layer.
Max-pooling layer discards the pose, i.e., translational and rotational
relationship between the low-level features, and hence unable to capture the
spatial hierarchies between low and high level features. The SIFT descriptor
layer captures the orientation and the spatial relationship of the features
extracted by convolutional layer. The proposed SIFT Descriptor CNN therefore
combines the feature extraction capabilities of CNN model and rotation
invariance of SIFT descriptor. Experimental results on the MNIST and
fashionMNIST datasets indicates reasonable improvements over conventional
methods available in literature.Comment: Accepted in IEEE INDICON 201
Multi-feature Distance Metric Learning for Non-rigid 3D Shape Retrieval
In the past decades, feature-learning-based 3D shape retrieval approaches
have been received widespread attention in the computer graphic community.
These approaches usually explored the hand-crafted distance metric or
conventional distance metric learning methods to compute the similarity of the
single feature. The single feature always contains onefold geometric
information, which cannot characterize the 3D shapes well. Therefore, the
multiple features should be used for the retrieval task to overcome the
limitation of single feature and further improve the performance. However, most
conventional distance metric learning methods fail to integrate the
complementary information from multiple features to construct the distance
metric. To address these issue, a novel multi-feature distance metric learning
method for non-rigid 3D shape retrieval is presented in this study, which can
make full use of the complimentary geometric information from multiple shape
features by utilizing the KL-divergences. Minimizing KL-divergence between
different metric of features and a common metric is a consistency constraints,
which can lead the consistency shared latent feature space of the multiple
features. We apply the proposed method to 3D model retrieval, and test our
method on well known benchmark database. The results show that our method
substantially outperforms the state-of-the-art non-rigid 3D shape retrieval
methods
LOAD: Local Orientation Adaptive Descriptor for Texture and Material Classification
In this paper, we propose a novel local feature, called Local Orientation
Adaptive Descriptor (LOAD), to capture regional texture in an image. In LOAD,
we proposed to define point description on an Adaptive Coordinate System (ACS),
adopt a binary sequence descriptor to capture relationships between one point
and its neighbors and use multi-scale strategy to enhance the discriminative
power of the descriptor. The proposed LOAD enjoys not only discriminative power
to capture the texture information, but also has strong robustness to
illumination variation and image rotation. Extensive experiments on benchmark
data sets of texture classification and real-world material recognition show
that the proposed LOAD yields the state-of-the-art performance. It is worth to
mention that we achieve a 65.4\% classification accuracy-- which is, to the
best of our knowledge, the highest record by far --on Flickr Material Database
by using a single feature. Moreover, by combining LOAD with the feature
extracted by Convolutional Neural Networks (CNN), we obtain significantly
better performance than both the LOAD and CNN. This result confirms that the
LOAD is complementary to the learning-based features.Comment: 13 pages, 7 figure
Recent Advances in Features Extraction and Description Algorithms: A Comprehensive Survey
Computer vision is one of the most active research fields in information
technology today. Giving machines and robots the ability to see and comprehend
the surrounding world at the speed of sight creates endless potential
applications and opportunities. Feature detection and description algorithms
can be indeed considered as the retina of the eyes of such machines and robots.
However, these algorithms are typically computationally intensive, which
prevents them from achieving the speed of sight real-time performance. In
addition, they differ in their capabilities and some may favor and work better
given a specific type of input compared to others. As such, it is essential to
compactly report their pros and cons as well as their performances and recent
advances. This paper is dedicated to provide a comprehensive overview on the
state-of-the-art and recent advances in feature detection and description
algorithms. Specifically, it starts by overviewing fundamental concepts. It
then compares, reports and discusses their performance and capabilities. The
Maximally Stable Extremal Regions algorithm and the Scale Invariant Feature
Transform algorithms, being two of the best of their type, are selected to
report their recent algorithmic derivatives.Comment: Annual IEEE Industrial Electronics Societys 18th International Conf.
on Industrial Technology (ICIT), 22-25 March, 201
- …