3 research outputs found
Deep hierarchical pooling design for cross-granularity action recognition
In this paper, we introduce a novel hierarchical aggregation design that
captures different levels of temporal granularity in action recognition. Our
design principle is coarse-to-fine and achieved using a tree-structured
network; as we traverse this network top-down, pooling operations are getting
less invariant but timely more resolute and well localized. Learning the
combination of operations in this network -- which best fits a given
ground-truth -- is obtained by solving a constrained minimization problem whose
solution corresponds to the distribution of weights that capture the
contribution of each level (and thereby temporal granularity) in the global
hierarchical pooling process. Besides being principled and well grounded, the
proposed hierarchical pooling is also video-length agnostic and resilient to
misalignments in actions. Extensive experiments conducted on the challenging
UCF-101 database corroborate these statements
End-to-end training of deep kernel map networks for image classification
Deep kernel map networks have shown excellent performances in various
classification problems including image annotation. Their general recipe
consists in aggregating several layers of singular value decompositions (SVDs)
-- that map data from input spaces into high dimensional spaces -- while
preserving the similarity of the underlying kernels. However, the potential of
these deep map networks has not been fully explored as the original setting of
these networks focuses mainly on the approximation quality of their kernels and
ignores their discrimination power. In this paper, we introduce a novel
"end-to-end" design for deep kernel map learning that balances the
approximation quality of kernels and their discrimination power. Our method
proceeds in two steps; first, layerwise SVD is applied in order to build
initial deep kernel map approximations and then an "end-to-end" supervised
learning is employed to further enhance their discrimination power while
maintaining their efficiency. Extensive experiments, conducted on the
challenging ImageCLEF annotation benchmark, show the high efficiency and the
out-performance of this two-step process with respect to different related
methods
Action Recognition with Deep Multiple Aggregation Networks
Most of the current action recognition algorithms are based on deep networks
which stack multiple convolutional, pooling and fully connected layers. While
convolutional and fully connected operations have been widely studied in the
literature, the design of pooling operations that handle action recognition,
with different sources of temporal granularity in action categories, has
comparatively received less attention, and existing solutions rely mainly on
max or averaging operations. The latter are clearly powerless to fully exhibit
the actual temporal granularity of action categories and thereby constitute a
bottleneck in classification performances. In this paper, we introduce a novel
hierarchical pooling design that captures different levels of temporal
granularity in action recognition. Our design principle is coarse-to-fine and
achieved using a tree-structured network; as we traverse this network top-down,
pooling operations are getting less invariant but timely more resolute and well
localized. Learning the combination of operations in this network -- which best
fits a given ground-truth -- is obtained by solving a constrained minimization
problem whose solution corresponds to the distribution of weights that capture
the contribution of each level (and thereby temporal granularity) in the global
hierarchical pooling process. Besides being principled and well grounded, the
proposed hierarchical pooling is also video-length and resolution agnostic.
Extensive experiments conducted on the challenging UCF-101, HMDB-51 and
JHMDB-21 databases corroborate all these statements