464 research outputs found
Multilevel Context Representation for Improving Object Recognition
In this work, we propose the combined usage of low- and high-level blocks of
convolutional neural networks (CNNs) for improving object recognition. While
recent research focused on either propagating the context from all layers, e.g.
ResNet, (including the very low-level layers) or having multiple loss layers
(e.g. GoogLeNet), the importance of the features close to the higher layers is
ignored. This paper postulates that the use of context closer to the high-level
layers provides the scale and translation invariance and works better than
using the top layer only. In particular, we extend AlexNet and GoogLeNet by
additional connections in the top layers. In order to demonstrate the
effectiveness of the proposed approach, we evaluated it on the standard
ImageNet task. The relative reduction of the classification error is around
1-2% without affecting the computational cost. Furthermore, we show that this
approach is orthogonal to typical test data augmentation techniques, as
recently introduced by Szegedy et al. (leading to a runtime reduction of 144
during test time)
Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines
This paper presents an approach for real-time training and testing for
document image classification. In production environments, it is crucial to
perform accurate and (time-)efficient training. Existing deep learning
approaches for classifying documents do not meet these requirements, as they
require much time for training and fine-tuning the deep architectures.
Motivated from Computer Vision, we propose a two-stage approach. The first
stage trains a deep network that works as feature extractor and in the second
stage, Extreme Learning Machines (ELMs) are used for classification. The
proposed approach outperforms all previously reported structural and deep
learning based methods with a final accuracy of 83.24% on Tobacco-3482 dataset,
leading to a relative error reduction of 25% when compared to a previous
Convolutional Neural Network (CNN) based approach (DeepDocClassifier). More
importantly, the training time of the ELM is only 1.176 seconds and the overall
prediction time for 2,482 images is 3.066 seconds. As such, this novel approach
makes deep learning-based document classification suitable for large-scale
real-time applications
Sparse Radial Sampling LBP for Writer Identification
In this paper we present the use of Sparse Radial Sampling Local Binary
Patterns, a variant of Local Binary Patterns (LBP) for text-as-texture
classification. By adapting and extending the standard LBP operator to the
particularities of text we get a generic text-as-texture classification scheme
and apply it to writer identification. In experiments on CVL and ICDAR 2013
datasets, the proposed feature-set demonstrates State-Of-the-Art (SOA)
performance. Among the SOA, the proposed method is the only one that is based
on dense extraction of a single local feature descriptor. This makes it fast
and applicable at the earliest stages in a DIA pipeline without the need for
segmentation, binarization, or extraction of multiple features.Comment: Submitted to the 13th International Conference on Document Analysis
and Recognition (ICDAR 2015
Identifying Cross-Depicted Historical Motifs
Cross-depiction is the problem of identifying the same object even when it is
depicted in a variety of manners. This is a common problem in handwritten
historical documents image analysis, for instance when the same letter or motif
is depicted in several different ways. It is a simple task for humans yet
conventional heuristic computer vision methods struggle to cope with it. In
this paper we address this problem using state-of-the-art deep learning
techniques on a dataset of historical watermarks containing images created with
different methods of reproduction, such as hand tracing, rubbing, and
radiography. To study the robustness of deep learning based approaches to the
cross-depiction problem, we measure their performance on two different tasks:
classification and similarity rankings. For the former we achieve a
classification accuracy of 96% using deep convolutional neural networks. For
the latter we have a false positive rate at 95% true positive rate of 0.11.
These results outperform state-of-the-art methods by a significant margin.Comment: 6 pages, 6 figure
Open Evaluation Tool for Layout Analysis of Document Images
This paper presents an open tool for standardizing the evaluation process of
the layout analysis task of document images at pixel level. We introduce a new
evaluation tool that is both available as a standalone Java application and as
a RESTful web service. This evaluation tool is free and open-source in order to
be a common tool that anyone can use and contribute to. It aims at providing as
many metrics as possible to investigate layout analysis predictions, and also
provide an easy way of visualizing the results. This tool evaluates document
segmentation at pixel level, and support multi-labeled pixel ground truth.
Finally, this tool has been successfully used for the ICDAR2017 competition on
Layout Analysis for Challenging Medieval Manuscripts.Comment: The 14th IAPR International Conference on Document Analysis and
Recognition (ICDAR), HIP: 4th International Workshop on Historical Document
Imaging and Processing, Kyoto, Japan, 201
- …
