28 research outputs found
Generalized Rank Pooling for Activity Recognition
Most popular deep models for action recognition split video sequences into
short sub-sequences consisting of a few frames; frame-based features are then
pooled for recognizing the activity. Usually, this pooling step discards the
temporal order of the frames, which could otherwise be used for better
recognition. Towards this end, we propose a novel pooling method, generalized
rank pooling (GRP), that takes as input, features from the intermediate layers
of a CNN that is trained on tiny sub-sequences, and produces as output the
parameters of a subspace which (i) provides a low-rank approximation to the
features and (ii) preserves their temporal order. We propose to use these
parameters as a compact representation for the video sequence, which is then
used in a classification setup. We formulate an objective for computing this
subspace as a Riemannian optimization problem on the Grassmann manifold, and
propose an efficient conjugate gradient scheme for solving it. Experiments on
several activity recognition datasets show that our scheme leads to
state-of-the-art performance.Comment: Accepted at IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR), 201
End-to-end representation learning for Correlation Filter based tracking
The Correlation Filter is an algorithm that trains a linear template to
discriminate between images and their translations. It is well suited to object
tracking because its formulation in the Fourier domain provides a fast
solution, enabling the detector to be re-trained once per frame. Previous works
that use the Correlation Filter, however, have adopted features that were
either manually designed or trained for a different task. This work is the
first to overcome this limitation by interpreting the Correlation Filter
learner, which has a closed-form solution, as a differentiable layer in a deep
neural network. This enables learning deep features that are tightly coupled to
the Correlation Filter. Experiments illustrate that our method has the
important practical benefit of allowing lightweight architectures to achieve
state-of-the-art performance at high framerates.Comment: To appear at CVPR 201