9,086 research outputs found
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
This paper presents a new state-of-the-art for document image classification
and retrieval, using features learned by deep convolutional neural networks
(CNNs). In object and scene analysis, deep neural nets are capable of learning
a hierarchical chain of abstraction from pixel inputs to concise and
descriptive representations. The current work explores this capacity in the
realm of document analysis, and confirms that this representation strategy is
superior to a variety of popular hand-crafted alternatives. Experiments also
show that (i) features extracted from CNNs are robust to compression, (ii) CNNs
trained on non-document images transfer well to document analysis tasks, and
(iii) enforcing region-specific feature-learning is unnecessary given
sufficient training data. This work also makes available a new labelled subset
of the IIT-CDIP collection, containing 400,000 document images across 16
categories, useful for training new CNNs for document analysis
Recovering Homography from Camera Captured Documents using Convolutional Neural Networks
Removing perspective distortion from hand held camera captured document
images is one of the primitive tasks in document analysis, but unfortunately,
no such method exists that can reliably remove the perspective distortion from
document images automatically. In this paper, we propose a convolutional neural
network based method for recovering homography from hand-held camera captured
documents.
Our proposed method works independent of document's underlying content and is
trained end-to-end in a fully automatic way. Specifically, this paper makes
following three contributions: Firstly, we introduce a large scale synthetic
dataset for recovering homography from documents images captured under
different geometric and photometric transformations; secondly, we show that a
generic convolutional neural network based architecture can be successfully
used for regressing the corners positions of documents captured under wild
settings; thirdly, we show that L1 loss can be reliably used for corners
regression. Our proposed method gives state-of-the-art performance on the
tested datasets, and has potential to become an integral part of document
analysis pipeline.Comment: 10 pages, 8 figure
Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks
In this paper, we present a novel approach to perform deep neural networks
layer-wise weight initialization using Linear Discriminant Analysis (LDA).
Typically, the weights of a deep neural network are initialized with: random
values, greedy layer-wise pre-training (usually as Deep Belief Network or as
auto-encoder) or by re-using the layers from another network (transfer
learning). Hence, many training epochs are needed before meaningful weights are
learned, or a rather similar dataset is required for seeding a fine-tuning of
transfer learning. In this paper, we describe how to turn an LDA into either a
neural layer or a classification layer. We analyze the initialization technique
on historical documents. First, we show that an LDA-based initialization is
quick and leads to a very stable initialization. Furthermore, for the task of
layout analysis at pixel level, we investigate the effectiveness of LDA-based
initialization and show that it outperforms state-of-the-art random weight
initialization methods.Comment: 5 page
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
We present an exhaustive investigation of recent Deep Learning architectures,
algorithms, and strategies for the task of document image classification to
finally reduce the error by more than half. Existing approaches, such as the
DeepDocClassifier, apply standard Convolutional Network architectures with
transfer learning from the object recognition domain. The contribution of the
paper is threefold: First, it investigates recently introduced very deep neural
network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from
real images). Second, it proposes transfer learning from a huge set of document
images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of
training data (document images) and other parameters to the classification
abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP
dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while
earlier approaches reach only 77.6%. Thus, a relative error reduction of more
than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is
achieved, corresponding to a relative error reduction of 11.5%
- …