3,887 research outputs found
Deep Fishing: Gradient Features from Deep Nets
Convolutional Networks (ConvNets) have recently improved image recognition
performance thanks to end-to-end learning of deep feed-forward models from raw
pixels. Deep learning is a marked departure from the previous state of the art,
the Fisher Vector (FV), which relied on gradient-based encoding of local
hand-crafted features. In this paper, we discuss a novel connection between
these two approaches. First, we show that one can derive gradient
representations from ConvNets in a similar fashion to the FV. Second, we show
that this gradient representation actually corresponds to a structured matrix
that allows for efficient similarity computation. We experimentally study the
benefits of transferring this representation over the outputs of ConvNet
layers, and find consistent improvements on the Pascal VOC 2007 and 2012
datasets.Comment: To appear at BMVC 201
Compact Bilinear Pooling
Bilinear models has been shown to achieve impressive performance on a wide
range of visual tasks, such as semantic segmentation, fine grained recognition
and face recognition. However, bilinear features are high dimensional,
typically on the order of hundreds of thousands to a few million, which makes
them impractical for subsequent analysis. We propose two compact bilinear
representations with the same discriminative power as the full bilinear
representation but with only a few thousand dimensions. Our compact
representations allow back-propagation of classification errors enabling an
end-to-end optimization of the visual recognition system. The compact bilinear
representations are derived through a novel kernelized analysis of bilinear
pooling which provide insights into the discriminative power of bilinear
pooling, and a platform for further research in compact pooling methods.
Experimentation illustrate the utility of the proposed representations for
image classification and few-shot learning across several datasets.Comment: Camera ready version for CVP
Multi-scale Orderless Pooling of Deep Convolutional Activation Features
Deep convolutional neural networks (CNN) have shown their promise as a
universal representation for recognition. However, global CNN activations lack
geometric invariance, which limits their robustness for classification and
matching of highly variable scenes. To improve the invariance of CNN
activations without degrading their discriminative power, this paper presents a
simple but effective scheme called multi-scale orderless pooling (MOP-CNN).
This scheme extracts CNN activations for local patches at multiple scale
levels, performs orderless VLAD pooling of these activations at each level
separately, and concatenates the result. The resulting MOP-CNN representation
can be used as a generic feature for either supervised or unsupervised
recognition tasks, from image classification to instance-level retrieval; it
consistently outperforms global CNN activations without requiring any joint
training of prediction layers for a particular target dataset. In absolute
terms, it achieves state-of-the-art results on the challenging SUN397 and MIT
Indoor Scenes classification datasets, and competitive results on
ILSVRC2012/2013 classification and INRIA Holidays retrieval datasets
- …