6,359 research outputs found
GOLD: Gaussians of Local Descriptors for Image Representation
The Bag of Words paradigm has been the baseline from which several successful image classification solutions were developed in the last decade. These represent images by quantizing local descriptors and summarizing their distribution. The quantization step introduces a dependency on the dataset, that even if in some contexts significantly boosts the performance, severely limits its generalization capabilities. Differently, in this paper, we propose to model the local features distribution with a multivariate Gaussian, without any quantization. The full rank covariance matrix, which lies on a Riemannian manifold, is projected on the tangent Euclidean space and concatenated to the mean vector. The resulting representation, a Gaussian of local descriptors (GOLD), allows to use the dot product to closely approximate a distance between distributions without the need for expensive kernel computations.
We describe an image by an improved spatial pyramid, which avoids boundary effects with soft assignment: local descriptors contribute to neighboring Gaussians, forming a weighted spatial pyramid of GOLD descriptors.
In addition, we extend the model leveraging dataset characteristics in a mixture of Gaussian formulation further improving the classification accuracy. To deal with large scale datasets and high dimensional feature spaces the Stochastic Gradient Descent solver is adopted. Experimental results on several publicly available datasets show that the proposed method obtains state-of-the-art performance
Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation
In the traditional object recognition pipeline, descriptors are densely
sampled over an image, pooled into a high dimensional non-linear representation
and then passed to a classifier. In recent years, Fisher Vectors have proven
empirically to be the leading representation for a large variety of
applications. The Fisher Vector is typically taken as the gradients of the
log-likelihood of descriptors, with respect to the parameters of a Gaussian
Mixture Model (GMM). Motivated by the assumption that different distributions
should be applied for different datasets, we present two other Mixture Models
and derive their Expectation-Maximization and Fisher Vector expressions. The
first is a Laplacian Mixture Model (LMM), which is based on the Laplacian
distribution. The second Mixture Model presented is a Hybrid Gaussian-Laplacian
Mixture Model (HGLMM) which is based on a weighted geometric mean of the
Gaussian and Laplacian distribution. An interesting property of the
Expectation-Maximization algorithm for the latter is that in the maximization
step, each dimension in each component is chosen to be either a Gaussian or a
Laplacian. Finally, by using the new Fisher Vectors derived from HGLMMs, we
achieve state-of-the-art results for both the image annotation and the image
search by a sentence tasks.Comment: new version includes text synthesis by an RNN and experiments with
the COCO benchmar
Statistically Motivated Second Order Pooling
Second-order pooling, a.k.a.~bilinear pooling, has proven effective for deep
learning based visual recognition. However, the resulting second-order networks
yield a final representation that is orders of magnitude larger than that of
standard, first-order ones, making them memory-intensive and cumbersome to
deploy. Here, we introduce a general, parametric compression strategy that can
produce more compact representations than existing compression techniques, yet
outperform both compressed and uncompressed second-order models. Our approach
is motivated by a statistical analysis of the network's activations, relying on
operations that lead to a Gaussian-distributed final representation, as
inherently used by first-order deep networks. As evidenced by our experiments,
this lets us outperform the state-of-the-art first-order and second-order
models on several benchmark recognition datasets.Comment: Accepted to ECCV 2018. Camera ready version. 14 page, 5 figures, 3
table
Towards Effective Codebookless Model for Image Classification
The bag-of-features (BoF) model for image classification has been thoroughly
studied over the last decade. Different from the widely used BoF methods which
modeled images with a pre-trained codebook, the alternative codebook free image
modeling method, which we call Codebookless Model (CLM), attracted little
attention. In this paper, we present an effective CLM that represents an image
with a single Gaussian for classification. By embedding Gaussian manifold into
a vector space, we show that the simple incorporation of our CLM into a linear
classifier achieves very competitive accuracy compared with state-of-the-art
BoF methods (e.g., Fisher Vector). Since our CLM lies in a high dimensional
Riemannian manifold, we further propose a joint learning method of low-rank
transformation with support vector machine (SVM) classifier on the Gaussian
manifold, in order to reduce computational and storage cost. To study and
alleviate the side effect of background clutter on our CLM, we also present a
simple yet effective partial background removal method based on saliency
detection. Experiments are extensively conducted on eight widely used databases
to demonstrate the effectiveness and efficiency of our CLM method
Fuzzy spectral and spatial feature integration for classification of nonferrous materials in hyperspectral data
Hyperspectral data allows the construction of more elaborate models to sample the properties of the nonferrous materials than the standard RGB color representation. In this paper, the nonferrous waste materials are studied as they cannot be sorted by classical procedures due to their color, weight and shape similarities. The experimental results presented in this paper reveal that factors such as the various levels of oxidization of the waste materials and the slight differences in their chemical composition preclude the use of the spectral features in a simplistic manner for robust material classification. To address these problems, the proposed FUSSER (fuzzy spectral and spatial classifier) algorithm detailed in this paper merges the spectral and spatial features to obtain a combined feature vector that is able to better sample the properties of the nonferrous materials than the single pixel spectral features when applied to the construction of multivariate Gaussian distributions. This approach allows the implementation of statistical region merging techniques in order to increase the performance of the classification process. To achieve an efficient implementation, the dimensionality of the hyperspectral data is reduced by constructing bio-inspired spectral fuzzy sets that minimize the amount of redundant information contained in adjacent hyperspectral bands. The experimental results indicate that the proposed algorithm increased the overall classification rate from 44% using RGB data up to 98% when the spectral-spatial features are used for nonferrous material classification
- …