4 research outputs found

    Creating ground truth for historical manuscripts with document graphs and scribbling interaction

    No full text
    Ground truth is both – indispensable for training and evaluating document analysis methods, and yet very tedious to create manually. This especially holds true for complex historical manuscripts that exhibit challenging layouts with interfering and overlapping handwriting. In this paper, we propose a novel semi-automatic system to support layout annotations in such a scenario based on document graphs and a pen-based scribbling interaction. On the one hand, document graphs provide a sparse page representation that is already close to the desired ground truth and on the other hand, scribbling facilitates an efficient and convenient pen-based interaction with the graph. The performance of the system is demonstrated in the context of a newly introduced database of historical manuscripts with complex layouts

    Recognition of Historical Greek Polytonic Scripts Using LSTM Networks

    No full text
    This paper reports on high-performance Optical Character Recognition (OCR) experiments using Long Short- Term Memory (LSTM) Networks for Greek polytonic script. Even though there are many Greek polytonic manuscripts, the digitization of such documents has not been widely applied, and very limited work has been done on the recognition of such scripts. We have collected a large number of diverse document pages of Greek polytonic scripts in a novel database, called Polyton-DB, containing 15; 689 textlines of synthetic and authentic printed scripts and performed baseline experiments using LSTM Networks. Evaluation results show that the character error rate obtained with LSTM varies from 5,51% to 14,68% (depending on the document) and is better than two well-known OCR engines, namely, Tesseract and ABBYY FineReade
    corecore