34,445 research outputs found
Generalized Max Pooling
State-of-the-art patch-based image representations involve a pooling
operation that aggregates statistics computed from local descriptors. Standard
pooling operations include sum- and max-pooling. Sum-pooling lacks
discriminability because the resulting representation is strongly influenced by
frequent yet often uninformative descriptors, but only weakly influenced by
rare yet potentially highly-informative ones. Max-pooling equalizes the
influence of frequent and rare descriptors but is only applicable to
representations that rely on count statistics, such as the bag-of-visual-words
(BOV) and its soft- and sparse-coding extensions. We propose a novel pooling
mechanism that achieves the same effect as max-pooling but is applicable beyond
the BOV and especially to the state-of-the-art Fisher Vector -- hence the name
Generalized Max Pooling (GMP). It involves equalizing the similarity between
each patch and the pooled representation, which is shown to be equivalent to
re-weighting the per-patch statistics. We show on five public image
classification benchmarks that the proposed GMP can lead to significant
performance gains with respect to heuristic alternatives.Comment: (to appear) CVPR 2014 - IEEE Conference on Computer Vision & Pattern
Recognition (2014
Enumeration of max-pooling responses with generalized permutohedra
We investigate the combinatorics of max-pooling layers, which are functions
that downsample input arrays by taking the maximum over shifted windows of
input coordinates, and which are commonly used in convolutional neural
networks. We obtain results on the number of linearity regions of these
functions by equivalently counting the number of vertices of certain Minkowski
sums of simplices. We characterize the faces of such polytopes and obtain
generating functions and closed formulas for the number of vertices and facets
in a 1D max-pooling layer depending on the size of the pooling windows and
stride, and for the number of vertices in a special case of 2D max-pooling.Comment: 35 pages, 11 figures, 4 tables. V2: Improved exposition, added
computations in Section 4, and expanded analysis of dat
- …