430 research outputs found
Generalized Rank Pooling for Activity Recognition
Most popular deep models for action recognition split video sequences into
short sub-sequences consisting of a few frames; frame-based features are then
pooled for recognizing the activity. Usually, this pooling step discards the
temporal order of the frames, which could otherwise be used for better
recognition. Towards this end, we propose a novel pooling method, generalized
rank pooling (GRP), that takes as input, features from the intermediate layers
of a CNN that is trained on tiny sub-sequences, and produces as output the
parameters of a subspace which (i) provides a low-rank approximation to the
features and (ii) preserves their temporal order. We propose to use these
parameters as a compact representation for the video sequence, which is then
used in a classification setup. We formulate an objective for computing this
subspace as a Riemannian optimization problem on the Grassmann manifold, and
propose an efficient conjugate gradient scheme for solving it. Experiments on
several activity recognition datasets show that our scheme leads to
state-of-the-art performance.Comment: Accepted at IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR), 201
Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective
This paper addresses the task of dense non-rigid structure-from-motion
(NRSfM) using multiple images. State-of-the-art methods to this problem are
often hurdled by scalability, expensive computations, and noisy measurements.
Further, recent methods to NRSfM usually either assume a small number of sparse
feature points or ignore local non-linearities of shape deformations, and thus
cannot reliably model complex non-rigid deformations. To address these issues,
in this paper, we propose a new approach for dense NRSfM by modeling the
problem on a Grassmann manifold. Specifically, we assume the complex non-rigid
deformations lie on a union of local linear subspaces both spatially and
temporally. This naturally allows for a compact representation of the complex
non-rigid deformation over frames. We provide experimental results on several
synthetic and real benchmark datasets. The procured results clearly demonstrate
that our method, apart from being scalable and more accurate than
state-of-the-art methods, is also more robust to noise and generalizes to
highly non-linear deformations.Comment: 10 pages, 7 figure, 4 tables. Accepted for publication in Conference
on Computer Vision and Pattern Recognition (CVPR), 2018, typos fixed and
acknowledgement adde
Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure from Motion
Given dense image feature correspondences of a non-rigidly moving object
across multiple frames, this paper proposes an algorithm to estimate its 3D
shape for each frame. To solve this problem accurately, the recent
state-of-the-art algorithm reduces this task to set of local linear subspace
reconstruction and clustering problem using Grassmann manifold representation
\cite{kumar2018scalable}. Unfortunately, their method missed on some of the
critical issues associated with the modeling of surface deformations, for e.g.,
the dependence of a local surface deformation on its neighbors. Furthermore,
their representation to group high dimensional data points inevitably introduce
the drawbacks of categorizing samples on the high-dimensional Grassmann
manifold \cite{huang2015projection, harandi2014manifold}. Hence, to deal with
such limitations with \cite{kumar2018scalable}, we propose an algorithm that
jointly exploits the benefit of high-dimensional Grassmann manifold to perform
reconstruction, and its equivalent lower-dimensional representation to infer
suitable clusters. To accomplish this, we project each Grassmannians onto a
lower-dimensional Grassmann manifold which preserves and respects the
deformation of the structure w.r.t its neighbors. These Grassmann points in the
lower-dimension then act as a representative for the selection of
high-dimensional Grassmann samples to perform each local reconstruction. In
practice, our algorithm provides a geometrically efficient way to solve dense
NRSfM by switching between manifolds based on its benefit and usage.
Experimental results show that the proposed algorithm is very effective in
handling noise with reconstruction accuracy as good as or better than the
competing methods.Comment: New version with corrected typo. 10 Pages, 7 Figures, 1 Table.
Accepted for publication in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2019. Acknowledgement added. Supplementary material is
available at https://suryanshkumar.github.io
Integral Geometry and Holography
We present a mathematical framework which underlies the connection between
information theory and the bulk spacetime in the AdS/CFT
correspondence. A key concept is kinematic space: an auxiliary Lorentzian
geometry whose metric is defined in terms of conditional mutual informations
and which organizes the entanglement pattern of a CFT state. When the field
theory has a holographic dual obeying the Ryu-Takayanagi proposal, kinematic
space has a direct geometric meaning: it is the space of bulk geodesics studied
in integral geometry. Lengths of bulk curves are computed by kinematic volumes,
giving a precise entropic interpretation of the length of any bulk curve. We
explain how basic geometric concepts -- points, distances and angles -- are
reflected in kinematic space, allowing one to reconstruct a large class of
spatial bulk geometries from boundary entanglement entropies. In this way,
kinematic space translates between information theoretic and geometric
descriptions of a CFT state. As an example, we discuss in detail the static
slice of AdS whose kinematic space is two-dimensional de Sitter space.Comment: 23 pages + appendices, including 23 figures and an exercise sheet
with solutions; a Mathematica visualization too
On landmark selection and sampling in high-dimensional data analysis
In recent years, the spectral analysis of appropriately defined kernel
matrices has emerged as a principled way to extract the low-dimensional
structure often prevalent in high-dimensional data. Here we provide an
introduction to spectral methods for linear and nonlinear dimension reduction,
emphasizing ways to overcome the computational limitations currently faced by
practitioners with massive datasets. In particular, a data subsampling or
landmark selection process is often employed to construct a kernel based on
partial information, followed by an approximate spectral analysis termed the
Nystrom extension. We provide a quantitative framework to analyse this
procedure, and use it to demonstrate algorithmic performance bounds on a range
of practical approaches designed to optimize the landmark selection process. We
compare the practical implications of these bounds by way of real-world
examples drawn from the field of computer vision, whereby low-dimensional
manifold structure is shown to emerge from high-dimensional video data streams.Comment: 18 pages, 6 figures, submitted for publicatio
Reduction, Symmetry and Phases in Mechanics
Various holonomy phenomena are shown to be instances of the reconstruction procedure
for mechanical systems with symmetry. We systematically exploit this point of view for fixed
systems (for example with controls on the internal, or reduced, variables) and for slowly moving
systems in an adiabatic context. For the latter, we obtain the phases as the holonomy for a
connection which synthesizes the Cartan connection for moving mechanical systems with the
Hannay-Berry connection for integrable systems. This synthesis allows one to treat in a natural
way examples like the ball in the slowly rotating hoop and also non-integrable mechanical systems
On-Manifold Preintegration for Real-Time Visual-Inertial Odometry
Current approaches for visual-inertial odometry (VIO) are able to attain
highly accurate state estimation via nonlinear optimization. However, real-time
optimization quickly becomes infeasible as the trajectory grows over time, this
problem is further emphasized by the fact that inertial measurements come at
high rate, hence leading to fast growth of the number of variables in the
optimization. In this paper, we address this issue by preintegrating inertial
measurements between selected keyframes into single relative motion
constraints. Our first contribution is a \emph{preintegration theory} that
properly addresses the manifold structure of the rotation group. We formally
discuss the generative measurement model as well as the nature of the rotation
noise and derive the expression for the \emph{maximum a posteriori} state
estimator. Our theoretical development enables the computation of all necessary
Jacobians for the optimization and a-posteriori bias correction in analytic
form. The second contribution is to show that the preintegrated IMU model can
be seamlessly integrated into a visual-inertial pipeline under the unifying
framework of factor graphs. This enables the application of
incremental-smoothing algorithms and the use of a \emph{structureless} model
for visual measurements, which avoids optimizing over the 3D points, further
accelerating the computation. We perform an extensive evaluation of our
monocular \VIO pipeline on real and simulated datasets. The results confirm
that our modelling effort leads to accurate state estimation in real-time,
outperforming state-of-the-art approaches.Comment: 20 pages, 24 figures, accepted for publication in IEEE Transactions
on Robotics (TRO) 201
- …