3,740 research outputs found
A Review of Codebook Models in Patch-Based Visual Object Recognition
The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods
Recommended from our members
Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels
In this paper we propose two distinct enhancements to the basic
''bag-of-keypoints" image categorisation scheme proposed in [4]. In this
approach images are represented as a variable sized set of local image
features (keypoints). Thus, we require machine learning tools which
can operate on sets of vectors. In [4] this is achieved by representing
the set as a histogram over bins found by k-means. We show how this
approach can be improved and generalised using Gaussian Mixture Models
(GMMs). Alternatively, the set of keypoints can be represented directly
as a probability density function, over which a kernel can be de ned. This
approach is shown to give state of the art categorisation performance
Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences
This paper introduces sparse coding and dictionary learning for Symmetric
Positive Definite (SPD) matrices, which are often used in machine learning,
computer vision and related areas. Unlike traditional sparse coding schemes
that work in vector spaces, in this paper we discuss how SPD matrices can be
described by sparse combination of dictionary atoms, where the atoms are also
SPD matrices. We propose to seek sparse coding by embedding the space of SPD
matrices into Hilbert spaces through two types of Bregman matrix divergences.
This not only leads to an efficient way of performing sparse coding, but also
an online and iterative scheme for dictionary learning. We apply the proposed
methods to several computer vision tasks where images are represented by region
covariance matrices. Our proposed algorithms outperform state-of-the-art
methods on a wide range of classification tasks, including face recognition,
action recognition, material classification and texture categorization
Kernel Cross-Correlator
Cross-correlator plays a significant role in many visual perception tasks,
such as object detection and tracking. Beyond the linear cross-correlator, this
paper proposes a kernel cross-correlator (KCC) that breaks traditional
limitations. First, by introducing the kernel trick, the KCC extends the linear
cross-correlation to non-linear space, which is more robust to signal noises
and distortions. Second, the connection to the existing works shows that KCC
provides a unified solution for correlation filters. Third, KCC is applicable
to any kernel function and is not limited to circulant structure on training
data, thus it is able to predict affine transformations with customized
properties. Last, by leveraging the fast Fourier transform (FFT), KCC
eliminates direct calculation of kernel vectors, thus achieves better
performance yet still with a reasonable computational cost. Comprehensive
experiments on visual tracking and human activity recognition using wearable
devices demonstrate its robustness, flexibility, and efficiency. The source
codes of both experiments are released at https://github.com/wang-chen/KCCComment: The Thirty-Second AAAI Conference on Artificial Intelligence
(AAAI-18
- …