2 research outputs found
Bench-Marking Information Extraction in Semi-Structured Historical Handwritten Records
In this report, we present our findings from benchmarking experiments for
information extraction on historical handwritten marriage records Esposalles
from IEHHR - ICDAR 2017 robust reading competition. The information extraction
is modeled as semantic labeling of the sequence across 2 set of labels. This
can be achieved by sequentially or jointly applying handwritten text
recognition (HTR) and named entity recognition (NER). We deploy a pipeline
approach where first we use state-of-the-art HTR and use its output as input
for NER. We show that given low resource setup and simple structure of the
records, high performance of HTR ensures overall high performance. We explore
the various configurations of conditional random fields and neural networks to
benchmark NER on given certain noisy input. The best model on 10-fold
cross-validation as well as blind test data uses n-gram features with
bidirectional long short-term memory
Comparing Machine Learning Approaches for Table Recognition in Historical Register Books
We present in this paper experiments on Table Recognition in hand-written
registry books. We first explain how the problem of row and column detection is
modeled, and then compare two Machine Learning approaches (Conditional Random
Field and Graph Convolutional Network) for detecting these table elements.
Evaluation was conducted on death records provided by the Archive of the
Diocese of Passau. Both methods show similar results, a 89 F1 score, a quality
which allows for Information Extraction. Software and dataset are open
source/data.Comment: DAS 201