2 research outputs found
Handwritten Text Recognition for Historical Documents in the tranScriptorium Project
""漏 Owner/Author 2014. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM, In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage (pp. 111-117) http://dx.doi.org/10.1145/2595188.2595193Transcription of historical handwritten documents is a crucial
problem for making easier the access to these documents
to the general public. Currently, huge amount of historical
handwritten documents are being made available by on-line
portals worldwide. It is not realistic to obtain the transcription
of these documents manually, and therefore automatic
techniques has to be used. tranScriptorium is
a project that aims at researching on modern Handwritten
Text Recognition (HTR) technology for transcribing historical
handwritten documents. The HTR technology used in
tranScriptorium is based on models that are learnt automatically
from examples. This HTR technology has been
used on a Dutch collection from 15th century selected for
the tranScriptorium project. This paper provides preliminary
HTR results on this Dutch collection that are very
encouraging, taken into account that minimal resources have
been deployed to develop the transcription system.The research leading to these results has received funding from the European Union鈥檚 Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 600707 - tranScriptorium and the Spanish MEC under the STraDa (TIN2012-37475-C02-01) research project.S谩nchez Peir贸, JA.; Bosch Campos, V.; Romero G贸mez, V.; Depuydt, K.; De Does, J. (2014). Handwritten Text Recognition for Historical Documents in the tranScriptorium Project. ACM. https://doi.org/10.1145/2595188.2595193
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses