3,642 research outputs found
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
This paper presents a new state-of-the-art for document image classification
and retrieval, using features learned by deep convolutional neural networks
(CNNs). In object and scene analysis, deep neural nets are capable of learning
a hierarchical chain of abstraction from pixel inputs to concise and
descriptive representations. The current work explores this capacity in the
realm of document analysis, and confirms that this representation strategy is
superior to a variety of popular hand-crafted alternatives. Experiments also
show that (i) features extracted from CNNs are robust to compression, (ii) CNNs
trained on non-document images transfer well to document analysis tasks, and
(iii) enforcing region-specific feature-learning is unnecessary given
sufficient training data. This work also makes available a new labelled subset
of the IIT-CDIP collection, containing 400,000 document images across 16
categories, useful for training new CNNs for document analysis
Deep Fishing: Gradient Features from Deep Nets
Convolutional Networks (ConvNets) have recently improved image recognition
performance thanks to end-to-end learning of deep feed-forward models from raw
pixels. Deep learning is a marked departure from the previous state of the art,
the Fisher Vector (FV), which relied on gradient-based encoding of local
hand-crafted features. In this paper, we discuss a novel connection between
these two approaches. First, we show that one can derive gradient
representations from ConvNets in a similar fashion to the FV. Second, we show
that this gradient representation actually corresponds to a structured matrix
that allows for efficient similarity computation. We experimentally study the
benefits of transferring this representation over the outputs of ConvNet
layers, and find consistent improvements on the Pascal VOC 2007 and 2012
datasets.Comment: To appear at BMVC 201
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
We present an exhaustive investigation of recent Deep Learning architectures,
algorithms, and strategies for the task of document image classification to
finally reduce the error by more than half. Existing approaches, such as the
DeepDocClassifier, apply standard Convolutional Network architectures with
transfer learning from the object recognition domain. The contribution of the
paper is threefold: First, it investigates recently introduced very deep neural
network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from
real images). Second, it proposes transfer learning from a huge set of document
images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of
training data (document images) and other parameters to the classification
abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP
dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while
earlier approaches reach only 77.6%. Thus, a relative error reduction of more
than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is
achieved, corresponding to a relative error reduction of 11.5%
OnionNet: Sharing Features in Cascaded Deep Classifiers
The focus of our work is speeding up evaluation of deep neural networks in
retrieval scenarios, where conventional architectures may spend too much time
on negative examples. We propose to replace a monolithic network with our novel
cascade of feature-sharing deep classifiers, called OnionNet, where subsequent
stages may add both new layers as well as new feature channels to the previous
ones. Importantly, intermediate feature maps are shared among classifiers,
preventing them from the necessity of being recomputed. To accomplish this, the
model is trained end-to-end in a principled way under a joint loss. We validate
our approach in theory and on a synthetic benchmark. As a result demonstrated
in three applications (patch matching, object detection, and image retrieval),
our cascade can operate significantly faster than both monolithic networks and
traditional cascades without sharing at the cost of marginal decrease in
precision.Comment: Accepted to BMVC 201
- …