16 research outputs found
Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks
In this paper, we present a novel approach to perform deep neural networks
layer-wise weight initialization using Linear Discriminant Analysis (LDA).
Typically, the weights of a deep neural network are initialized with: random
values, greedy layer-wise pre-training (usually as Deep Belief Network or as
auto-encoder) or by re-using the layers from another network (transfer
learning). Hence, many training epochs are needed before meaningful weights are
learned, or a rather similar dataset is required for seeding a fine-tuning of
transfer learning. In this paper, we describe how to turn an LDA into either a
neural layer or a classification layer. We analyze the initialization technique
on historical documents. First, we show that an LDA-based initialization is
quick and leads to a very stable initialization. Furthermore, for the task of
layout analysis at pixel level, we investigate the effectiveness of LDA-based
initialization and show that it outperforms state-of-the-art random weight
initialization methods.Comment: 5 page
Identifying Cross-Depicted Historical Motifs
Cross-depiction is the problem of identifying the same object even when it is
depicted in a variety of manners. This is a common problem in handwritten
historical documents image analysis, for instance when the same letter or motif
is depicted in several different ways. It is a simple task for humans yet
conventional heuristic computer vision methods struggle to cope with it. In
this paper we address this problem using state-of-the-art deep learning
techniques on a dataset of historical watermarks containing images created with
different methods of reproduction, such as hand tracing, rubbing, and
radiography. To study the robustness of deep learning based approaches to the
cross-depiction problem, we measure their performance on two different tasks:
classification and similarity rankings. For the former we achieve a
classification accuracy of 96% using deep convolutional neural networks. For
the latter we have a false positive rate at 95% true positive rate of 0.11.
These results outperform state-of-the-art methods by a significant margin.Comment: 6 pages, 6 figure
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Automatic analysis of scanned historical documents comprises a wide range of
image analysis tasks, which are often challenging for machine learning due to a
lack of human-annotated learning samples. With the advent of deep neural
networks, a promising way to cope with the lack of training data is to
pre-train models on images from a different domain and then fine-tune them on
historical documents. In the current research, a typical example of such
cross-domain transfer learning is the use of neural networks that have been
pre-trained on the ImageNet database for object recognition. It remains a
mostly open question whether or not this pre-training helps to analyse
historical documents, which have fundamentally different image properties when
compared with ImageNet. In this paper, we present a comprehensive empirical
survey on the effect of ImageNet pre-training for diverse historical document
analysis tasks, including character recognition, style classification,
manuscript dating, semantic segmentation, and content-based retrieval. While we
obtain mixed results for semantic segmentation at pixel-level, we observe a
clear trend across different network architectures that ImageNet pre-training
has a positive effect on classification as well as content-based retrieval
Subword Semantic Hashing for Intent Classification on Small Datasets
In this paper, we introduce the use of Semantic Hashing as embedding for the
task of Intent Classification and achieve state-of-the-art performance on three
frequently used benchmarks. Intent Classification on a small dataset is a
challenging task for data-hungry state-of-the-art Deep Learning based systems.
Semantic Hashing is an attempt to overcome such a challenge and learn robust
text classification. Current word embedding based are dependent on
vocabularies. One of the major drawbacks of such methods is out-of-vocabulary
terms, especially when having small training datasets and using a wider
vocabulary. This is the case in Intent Classification for chatbots, where
typically small datasets are extracted from internet communication. Two
problems arise by the use of internet communication. First, such datasets miss
a lot of terms in the vocabulary to use word embeddings efficiently. Second,
users frequently make spelling errors. Typically, the models for intent
classification are not trained with spelling errors and it is difficult to
think about ways in which users will make mistakes. Models depending on a word
vocabulary will always face such issues. An ideal classifier should handle
spelling errors inherently. With Semantic Hashing, we overcome these challenges
and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and
Web Application. Our benchmarks are available online:
https://github.com/kumar-shridhar/Know-Your-IntentComment: Accepted at IJCNN 2019 (Oral Presentation
DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments
We introduce DeepDIVA: an infrastructure designed to enable quick and
intuitive setup of reproducible experiments with a large range of useful
analysis functionality. Reproducing scientific results can be a frustrating
experience, not only in document image analysis but in machine learning in
general. Using DeepDIVA a researcher can either reproduce a given experiment
with a very limited amount of information or share their own experiments with
others. Moreover, the framework offers a large range of functions, such as
boilerplate code, keeping track of experiments, hyper-parameter optimization,
and visualization of data and results. To demonstrate the effectiveness of this
framework, this paper presents case studies in the area of handwritten document
analysis where researchers benefit from the integrated functionality. DeepDIVA
is implemented in Python and uses the deep learning framework PyTorch. It is
completely open source, and accessible as Web Service through DIVAServices.Comment: Submitted at the 16th International Conference on Frontiers in
Handwriting Recognition (ICFHR), 6 pages, 6 Figure
Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass
In the last decades we have witnessed the success of applications of
Artificial Intelligence to playing games. In this work we address the
challenging field of games with hidden information and card games in
particular. Jass is a very popular card game in Switzerland and is closely
connected with Swiss culture. To the best of our knowledge, performances of
Artificial Intelligence agents in the game of Jass do not outperform top
players yet. Our contribution to the community is two-fold. First, we provide
an overview of the current state-of-the-art of Artificial Intelligence methods
for card games in general. Second, we discuss their application to the use-case
of the Swiss card game Jass. This paper aims to be an entry point for both
seasoned researchers and new practitioners who want to join in the Jass
challenge
Improving Reproducible Deep Learning Workflows with DeepDIVA
The field of deep learning is experiencing a trend towards producing
reproducible research. Nevertheless, it is still often a frustrating experience
to reproduce scientific results. This is especially true in the machine
learning community, where it is considered acceptable to have black boxes in
your experiments. We present DeepDIVA, a framework designed to facilitate easy
experimentation and their reproduction. This framework allows researchers to
share their experiments with others, while providing functionality that allows
for easy experimentation, such as: boilerplate code, experiment management,
hyper-parameter optimization, verification of data integrity and visualization
of data and results. Additionally, the code of DeepDIVA is well-documented and
supported by several tutorials that allow a new user to quickly familiarize
themselves with the framework
Historical Document Synthesis with Generative Adversarial Networks
This work tackles a particular image-to-image translation problem, where the goal is to transform an image from a source domain (modern printed electronic document) to a target domain (historical handwritten document). The main motivation of this task is to generate massive synthetic datasets of "historic" documents which can be used for the training of document analysis systems. By completing this task, it becomes possible to consider the generation of a tremendous amount of synthetic training data using only one single deep learning algorithm. Existing approaches for synthetic document generation rely on heuristics, or 2D and 3D geometric transformation-functions and are typically targeted at degrading the document. We tackle the problem of document synthesis and propose to train a particular form of Generative Adversarial Neural Networks, to learn a mapping function from an input image to an output image. With several experiments, we show that our algorithm generates an artificial historical document image that looks like a real historical document - for expert and non-expert eyes - by transferring the "historical style" to the classical electronic document.</p
Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning
In this paper, we tackle the problem of categorizing and identifying cross-depicted historical motifs using recent deep learning techniques, with aim of developing a content-based image retrieval system. As cross-depiction, we understand the problem that the same object can be represented (depicted) in various ways. The objects of interest in this research are watermarks, which are crucial for dating manuscripts. For watermarks, cross-depiction arises due to two reasons: (i) there are many similar representations of the same motif, and (ii) there are several ways of capturing the watermarks, i.e., as the watermarks are not visible on a scan or photograph, the watermarks are typically retrieved via hand tracing, rubbing, or special photographic techniques. This leads to different representations of the same (or similar) objects, making it hard for pattern recognition methods to recognize the watermarks. While this is a simple problem for human experts, computer vision techniques have problems generalizing from the various depiction possibilities. In this paper, we present a study where we use deep neural networks for categorization of watermarks with varying levels of detail. The macro-averaged F1-score on an imbalanced 12 category classification task is 88.3 %, the multi-labelling performance (Jaccard Index) on a 622 label task is 79.5 %. To analyze the usefulness of an image-based system for assisting humanities scholars in cataloguing manuscripts, we also measure the performance of similarity matching on expert-crafted test sets of varying sizes (50 and 1000 watermark samples). A significant outcome is that all relevant results belonging to the same super-class are found by our system (Mean Average Precision of 100%), despite the cross-depicted nature of the motifs. This result has not been achieved in the literature so far
