688 research outputs found
Towards Effective Codebookless Model for Image Classification
The bag-of-features (BoF) model for image classification has been thoroughly
studied over the last decade. Different from the widely used BoF methods which
modeled images with a pre-trained codebook, the alternative codebook free image
modeling method, which we call Codebookless Model (CLM), attracted little
attention. In this paper, we present an effective CLM that represents an image
with a single Gaussian for classification. By embedding Gaussian manifold into
a vector space, we show that the simple incorporation of our CLM into a linear
classifier achieves very competitive accuracy compared with state-of-the-art
BoF methods (e.g., Fisher Vector). Since our CLM lies in a high dimensional
Riemannian manifold, we further propose a joint learning method of low-rank
transformation with support vector machine (SVM) classifier on the Gaussian
manifold, in order to reduce computational and storage cost. To study and
alleviate the side effect of background clutter on our CLM, we also present a
simple yet effective partial background removal method based on saliency
detection. Experiments are extensively conducted on eight widely used databases
to demonstrate the effectiveness and efficiency of our CLM method
Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework
The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications
Predicting human eye fixations via an LSTM-Based saliency attentive model
Data-driven saliency has recently gained a lot of attention thanks to the use of convolutional neural networks for predicting gaze fixations. In this paper, we go beyond standard approaches to saliency prediction, in which gaze maps are computed with a feed-forward network, and present a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms. The core of our solution is a convolutional long short-term memory that focuses on the most salient regions of the input image to iteratively refine the predicted saliency map. In addition, to tackle the center bias typical of human eye fixations, our model can learn a set of prior maps generated with Gaussian functions. We show, through an extensive evaluation, that the proposed architecture outperforms the current state-of-the-art on public saliency prediction datasets. We further study the contribution of each key component to demonstrate their robustness on different scenarios
DAC: Detector-Agnostic Spatial Covariances for Deep Local Features
Current deep visual local feature detectors do not model the spatial
uncertainty of detected features, producing suboptimal results in downstream
applications. In this work, we propose two post-hoc covariance estimates that
can be plugged into any pretrained deep feature detector: a simple, isotropic
covariance estimate that uses the predicted score at a given pixel location,
and a full covariance estimate via the local structure tensor of the learned
score maps. Both methods are easy to implement and can be applied to any deep
feature detector. We show that these covariances are directly related to errors
in feature matching, leading to improvements in downstream tasks, including
solving the perspective-n-point problem and motion-only bundle adjustment. Code
is available at https://github.com/javrtg/DA
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
- …