717 research outputs found
Spatial image polynomial decomposition with application to video classification
International audienceThis paper addresses the use of orthogonal polynomial basis transform in video classification due to its multiple advantages, especially for multiscale and multiresolution analysis similar to the wavelet transform. In our approach, we benefit from these advantages to reduce the resolution of the video by using a multiscale/multiresolution decomposition to define a new algorithm that decomposes a color image into geometry and texture component by projecting the image on a bivariate polynomial basis and considering the geometry component as the partial reconstruction and the texture component as the remaining part, and finally to model the features (like motion and texture) extracted from reduced image sequences by projecting them into a bivariate polynomial basis in order to construct a hybrid polynomial motion texture video descriptor. To evaluate our approach, we consider two visual recognition tasks, namely the classification of dynamic textures and recognition of human actions. The experimental section shows that the proposed approach achieves a perfect recognition rate in the Weizmann database and highest accuracy in the Dyntex++ database compared to existing methods
Pyramidal Fisher Motion for Multiview Gait Recognition
The goal of this paper is to identify individuals by analyzing their gait.
Instead of using binary silhouettes as input data (as done in many previous
works) we propose and evaluate the use of motion descriptors based on densely
sampled short-term trajectories. We take advantage of state-of-the-art people
detectors to define custom spatial configurations of the descriptors around the
target person. Thus, obtaining a pyramidal representation of the gait motion.
The local motion features (described by the Divergence-Curl-Shear descriptor)
extracted on the different spatial areas of the person are combined into a
single high-level gait descriptor by using the Fisher Vector encoding. The
proposed approach, coined Pyramidal Fisher Motion, is experimentally validated
on the recent `AVA Multiview Gait' dataset. The results show that this new
approach achieves promising results in the problem of gait recognition.Comment: Submitted to International Conference on Pattern Recognition, ICPR,
201
A Unified framework for local visual descriptors evaluation
International audienceLocal descriptors are the ground layer of recognition feature based systems for still images and video. We propose a new framework to explain local descriptors. This framework is based on the descriptors decomposition in three levels: primitive extraction, primitive coding and code aggregation. With this framework, we are able to explain most of the popular descriptors in the literature such as HOG, HOF, SURF. We propose two new projection methods based on approximation with oscillating functions basis (sinus and Legendre polynomials). Using our framework, we are able to extend usual descriptors by changing the code aggregation or adding new primitive coding method. The experiments are carried out on images (VOC 2007) and videos datasets (KTH, Hollywood2 and UCF11), and achieve equal or better performances than the literature
ObjectFlow: A Descriptor for Classifying Traffic Motion
Abstract—We present and evaluate a novel scene descriptor for classifying urban traffic by object motion. Atomic 3D flow vectors are extracted and compensated for the vehicle’s egomo-tion, using stereo video sequences. Votes cast by each flow vector are accumulated in a bird’s eye view histogram grid. Since we are directly using low-level object flow, no prior object detection or tracking is needed. We demonstrate the effectiveness of the proposed descriptor by comparing it to two simpler baselines on the task of classifying more than 100 challenging video sequences into intersection and non-intersection scenarios. Our experiments reveal good classification performance in busy traffic situations, making our method a valuable complement to traditional approaches based on lane markings. I
The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning
A diverse number of tasks in computer vision and machine learning
enjoy from representations of data that are compact yet
discriminative, informative and robust to critical measurements.
Two notable representations are offered by Region Covariance
Descriptors (RCovD) and linear subspaces which are naturally
analyzed through the manifold of Symmetric Positive Definite
(SPD) matrices and the Grassmann manifold, respectively, two
widely used types of Riemannian manifolds in computer vision.
As our first objective, we examine image and video-based
recognition applications where the local descriptors have the
aforementioned Riemannian structures, namely the SPD or linear
subspace structure. Initially, we provide a solution to compute
Riemannian version of the conventional Vector of Locally
aggregated Descriptors (VLAD), using geodesic distance of the
underlying manifold as the nearness measure. Next, by having a
closer look at the resulting codes, we formulate a new concept
which we name Local Difference Vectors (LDV). LDVs enable us to
elegantly expand our Riemannian coding techniques to any
arbitrary metric as well as provide intrinsic solutions to
Riemannian sparse coding and its variants when local structured
descriptors are considered.
We then turn our attention to two special types of covariance
descriptors namely infinite-dimensional RCovDs and rank-deficient
covariance matrices for which the underlying Riemannian
structure, i.e. the manifold of SPD matrices is out of reach to
great extent. %Generally speaking, infinite-dimensional RCovDs
offer better discriminatory power over their low-dimensional
counterparts.
To overcome this difficulty, we propose to approximate the
infinite-dimensional RCovDs by making use of two feature
mappings, namely random Fourier features and the Nystrom method.
As for the rank-deficient covariance matrices, unlike most
existing approaches that employ inference tools by predefined
regularizers, we derive positive definite kernels that can be
decomposed into the kernels on the cone of SPD matrices and
kernels on the Grassmann manifolds and show their effectiveness
for image set classification task.
Furthermore, inspired by attractive properties of Riemannian
optimization techniques, we extend the recently introduced Keep
It Simple and Straightforward MEtric learning (KISSME) method to
the scenarios where input data is non-linearly distributed. To
this end, we make use of the infinite dimensional covariance
matrices and propose techniques towards projecting on the
positive cone in a Reproducing Kernel Hilbert Space (RKHS).
We also address the sensitivity issue of the KISSME to the input
dimensionality. The KISSME algorithm is greatly dependent on
Principal Component Analysis (PCA) as a preprocessing step which
can lead to difficulties, especially when the dimensionality is
not meticulously set.
To address this issue, based on the KISSME algorithm, we develop
a Riemannian framework to jointly learn a mapping performing
dimensionality reduction and a metric in the induced space.
Lastly, in line with the recent trend in metric learning, we
devise end-to-end learning of a generic deep network for metric
learning using our derivation
Symmetry Detection in Large Scale City Scans
In this report we present a novel method for detecting partial symmetries in very large point clouds of 3D city scans. Unlike previous work, which was limited to data sets of a few hundred megabytes maximum, our method scales to very large scenes. We map the detection problem to a nearestneighbor search in a low-dimensional feature space, followed by a cascade of tests for geometric clustering of potential matches. Our algorithm robustly handles noisy real-world scanner data, obtaining a recognition performance comparable to state-of-the-art methods. In practice, it scales linearly with the scene size and achieves a high absolute throughput, processing half a terabyte of raw scanner data over night on a dual socket commodity PC
- …