4,302 research outputs found
Local Multi-Grouped Binary Descriptor with Ring-based Pooling Configuration and Optimization
Local binary descriptors are attracting increasingly attention due to their
great advantages in computational speed, which are able to achieve real-time
performance in numerous image/vision applications. Various methods have been
proposed to learn data-dependent binary descriptors. However, most existing
binary descriptors aim overly at computational simplicity at the expense of
significant information loss which causes ambiguity in similarity measure using
Hamming distance. In this paper, by considering multiple features might share
complementary information, we present a novel local binary descriptor, referred
as Ring-based Multi-Grouped Descriptor (RMGD), to successfully bridge the
performance gap between current binary and floated-point descriptors. Our
contributions are two-fold. Firstly, we introduce a new pooling configuration
based on spatial ring-region sampling, allowing for involving binary tests on
the full set of pairwise regions with different shapes, scales and distances.
This leads to a more meaningful description than existing methods which
normally apply a limited set of pooling configurations. Then, an extended
Adaboost is proposed for efficient bit selection by emphasizing high variance
and low correlation, achieving a highly compact representation. Secondly, the
RMGD is computed from multiple image properties where binary strings are
extracted. We cast multi-grouped features integration as rankSVM or sparse SVM
learning problem, so that different features can compensate strongly for each
other, which is the key to discriminativeness and robustness. The performance
of RMGD was evaluated on a number of publicly available benchmarks, where the
RMGD outperforms the state-of-the-art binary descriptors significantly.Comment: To appear in IEEE Trans. on Image Processing, 201
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
LDOP: Local Directional Order Pattern for Robust Face Retrieval
The local descriptors have gained wide range of attention due to their
enhanced discriminative abilities. It has been proved that the consideration of
multi-scale local neighborhood improves the performance of the descriptor,
though at the cost of increased dimension. This paper proposes a novel method
to construct a local descriptor using multi-scale neighborhood by finding the
local directional order among the intensity values at different scales in a
particular direction. Local directional order is the multi-radius relationship
factor in a particular direction. The proposed local directional order pattern
(LDOP) for a particular pixel is computed by finding the relationship between
the center pixel and local directional order indexes. It is required to
transform the center value into the range of neighboring orders. Finally, the
histogram of LDOP is computed over whole image to construct the descriptor. In
contrast to the state-of-the-art descriptors, the dimension of the proposed
descriptor does not depend upon the number of neighbors involved to compute the
order; it only depends upon the number of directions. The introduced descriptor
is evaluated over the image retrieval framework and compared with the
state-of-the-art descriptors over challenging face databases such as PaSC, LFW,
PubFig, FERET, AR, AT&T, and ExtendedYale. The experimental results confirm the
superiority and robustness of the LDOP descriptor.Comment: Published in Multimedia Tools and Applications, Springe
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Point Context: An Effective Shape Descriptor for RST-invariant Trajectory Recognition
Motion trajectory recognition is important for characterizing the moving
property of an object. The speed and accuracy of trajectory recognition rely on
a compact and discriminative feature representation, and the situations of
varying rotation, scaling and translation has to be specially considered. In
this paper we propose a novel feature extraction method for trajectories.
Firstly a trajectory is represented by a proposed point context, which is a
rotation-scale-translation (RST) invariant shape descriptor with a flexible
tradeoff between computational complexity and discrimination, yet we prove that
it is a complete shape descriptor. Secondly, the shape context is nonlinearly
mapped to a subspace by kernel nonparametric discriminant analysis (KNDA) to
get a compact feature representation, and thus a trajectory is projected to a
single point in a low-dimensional feature space. Experimental results show
that, the proposed trajectory feature shows encouraging improvement than
state-of-art methods.Comment: 11 pages, 10 figure
Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce an end-to-end framework for video analysis
focused towards practical scenarios built on theoretical foundations from
sparse representation, including a novel descriptor for general purpose video
analysis. In our approach, we compute kinematic features from optical flow and
first and second-order derivatives of intensities to represent motion and
appearance respectively. These features are then used to construct covariance
matrices which capture joint statistics of both low-level motion and appearance
features extracted from a video. Using an over-complete dictionary of the
covariance based descriptors built from labeled training samples, we formulate
low-level event recognition as a sparse linear approximation problem. Within
this, we pose the sparse decomposition of a covariance matrix, which also
conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear
Riemannian manifolds, we compare our former approach with a sparse linear
approximation alternative that is suitable for equivalent vector spaces of
covariance matrices. This is done by searching for the best projection of the
query data on a dictionary using an Orthogonal Matching pursuit algorithm. We
show the applicability of our video descriptor in two different application
domains - namely low-level event recognition in unconstrained scenarios and
gesture recognition using one shot learning. Our experiments provide promising
insights in large scale video analysis
From handcrafted to deep local features
This paper presents an overview of the evolution of local features from
handcrafted to deep-learning-based methods, followed by a discussion of several
benchmarks and papers evaluating such local features. Our investigations are
motivated by 3D reconstruction problems, where the precise location of the
features is important. As we describe these methods, we highlight and explain
the challenges of feature extraction and potential ways to overcome them. We
first present handcrafted methods, followed by methods based on classical
machine learning and finally we discuss methods based on deep-learning. This
largely chronologically-ordered presentation will help the reader to fully
understand the topic of image and region description in order to make best use
of it in modern computer vision applications. In particular, understanding
handcrafted methods and their motivation can help to understand modern
approaches and how machine learning is used to improve the results. We also
provide references to most of the relevant literature and code.Comment: Preprin
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
Con-Patch: When a Patch Meets its Context
Measuring the similarity between patches in images is a fundamental building
block in various tasks. Naturally, the patch-size has a major impact on the
matching quality, and on the consequent application performance. Under the
assumption that our patch database is sufficiently sampled, using large patches
(e.g. 21-by-21) should be preferred over small ones (e.g. 7-by-7). However,
this "dense-sampling" assumption is rarely true; in most cases large patches
cannot find relevant nearby examples. This phenomenon is a consequence of the
curse of dimensionality, stating that the database-size should grow
exponentially with the patch-size to ensure proper matches. This explains the
favored choice of small patch-size in most applications.
Is there a way to keep the simplicity and work with small patches while
getting some of the benefits that large patches provide? In this work we offer
such an approach. We propose to concatenate the regular content of a
conventional (small) patch with a compact representation of its (large)
surroundings - its context. Therefore, with a minor increase of the dimensions
(e.g. with additional 10 values to the patch representation), we
implicitly/softly describe the information of a large patch. The additional
descriptors are computed based on a self-similarity behavior of the patch
surrounding.
We show that this approach achieves better matches, compared to the use of
conventional-size patches, without the need to increase the database-size.
Also, the effectiveness of the proposed method is tested on three distinct
problems: (i) External natural image denoising, (ii) Depth image
super-resolution, and (iii) Motion-compensated frame-rate up-conversion.Comment: Accepted to IEEE Transactions on Image Processin
Robust Face Recognition with Structural Binary Gradient Patterns
This paper presents a computationally efficient yet powerful binary framework
for robust facial representation based on image gradients. It is termed as
structural binary gradient patterns (SBGP). To discover underlying local
structures in the gradient domain, we compute image gradients from multiple
directions and simplify them into a set of binary strings. The SBGP is derived
from certain types of these binary strings that have meaningful local
structures and are capable of resembling fundamental textural information. They
detect micro orientational edges and possess strong orientation and locality
capabilities, thus enabling great discrimination. The SBGP also benefits from
the advantages of the gradient domain and exhibits profound robustness against
illumination variations. The binary strategy realized by pixel correlations in
a small neighborhood substantially simplifies the computational complexity and
achieves extremely efficient processing with only 0.0032s in Matlab for a
typical face image. Furthermore, the discrimination power of the SBGP can be
enhanced on a set of defined orientational image gradient magnitudes, further
enforcing locality and orientation. Results of extensive experiments on various
benchmark databases illustrate significant improvements of the SBGP based
representations over the existing state-of-the-art local descriptors in the
terms of discrimination, robustness and complexity. Codes for the SBGP methods
will be available at
http://www.eee.manchester.ac.uk/research/groups/sisp/software/
- …