13,099 research outputs found
Second-order Democratic Aggregation
Aggregated second-order features extracted from deep convolutional networks
have been shown to be effective for texture generation, fine-grained
recognition, material classification, and scene understanding. In this paper,
we study a class of orderless aggregation functions designed to minimize
interference or equalize contributions in the context of second-order features
and we show that they can be computed just as efficiently as their first-order
counterparts and they have favorable properties over aggregation by summation.
Another line of work has shown that matrix power normalization after
aggregation can significantly improve the generalization of second-order
representations. We show that matrix power normalization implicitly equalizes
contributions during aggregation thus establishing a connection between matrix
normalization techniques and prior work on minimizing interference. Based on
the analysis we present {\gamma}-democratic aggregators that interpolate
between sum ({\gamma}=1) and democratic pooling ({\gamma}=0) outperforming both
on several classification tasks. Moreover, unlike power normalization, the
{\gamma}-democratic aggregations can be computed in a low dimensional space by
sketching that allows the use of very high-dimensional second-order features.
This results in a state-of-the-art performance on several datasets
Generalized Rank Pooling for Activity Recognition
Most popular deep models for action recognition split video sequences into
short sub-sequences consisting of a few frames; frame-based features are then
pooled for recognizing the activity. Usually, this pooling step discards the
temporal order of the frames, which could otherwise be used for better
recognition. Towards this end, we propose a novel pooling method, generalized
rank pooling (GRP), that takes as input, features from the intermediate layers
of a CNN that is trained on tiny sub-sequences, and produces as output the
parameters of a subspace which (i) provides a low-rank approximation to the
features and (ii) preserves their temporal order. We propose to use these
parameters as a compact representation for the video sequence, which is then
used in a classification setup. We formulate an objective for computing this
subspace as a Riemannian optimization problem on the Grassmann manifold, and
propose an efficient conjugate gradient scheme for solving it. Experiments on
several activity recognition datasets show that our scheme leads to
state-of-the-art performance.Comment: Accepted at IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR), 201
- …