2 research outputs found

    DH Benelux Journal 2. Digital Humanities in Society

    Get PDF
    The second volume of the DH Benelux Journal. This volume includes four full-length, peer-reviewed articles that are based on accepted contributions to the 2019 DH Benelux conference in Liège (Belgium) on Digital Humanities in Society. Contents: 1. Editors' Preface (Wout Dillen, Marijn Koolen, Marieke van Erp); 2. Introduction: Digital Humanities in Society (Ingrid Mayeur and Claartje Rasterhoff); 3. A Corpus-Based Approach to Michelangelo’s Epistolary Language (GianlucaValenti); 4. The Datafication of Early Modern Ordinances (C. Annemieke Romein, Sara Veldhoen, and Michel de Gruijter); 5. A-poetic Technology. #GraphPoem and the Social Function of Computational Performance (Chris Tanasescu, Diana Inkpen, Vaibhav Kesarwani, and Prasadith Kirinde Gamaarachchige); 6. Decomplexifying the network pipeline: a tool for RDF/Wikidata to network analysis (Julie M. Birkholz and Albert Meroo-Peuel

    Reading order detection on handwritten documents

    Full text link
    [EN] Recent advances in Handwritten Text Recognition and Document Layout Analysis have made it possible to convert digital images of manuscripts into electronic text. However, providing this text with the correct structure and context is still an open problem that needs to be solved to actually enable extracting the relevant information conveyed by the text. The most important structure needed for a set of text elements is their reading order. Most of the studies on the reading order problem are rule-based approaches and focus on printed documents. Much less attention has been paid so far to handwritten text documents, where the problem becomes particularly important-and challenging. In this work, we propose a new approach to automatically determine the reading order of text regions and lines in handwritten text documents. The task is approached as a sorting problem where the order-relation operator is automatically learned from examples. We experimentally demonstrate the effectiveness of our method on three different datasets at different hierarchical levels.The authors want to thank to the Centre de Recerca d'Història Rural, the National Archives of Finland and Déjean Hervé for facilitating the datasets used in this work, and to Juan Miguel Vilar for the enlighment comments. Also, this work was partially supported by Universitat Politècnica de València under grant FPI-II/900, by Generalitat Valenciana under the EU-FEDER Comunitat Valenciana 2014-2020 grant IDIFEDER/2018/025 'Sistemas de fabricación inteligente para la indústria 4.0', by the Ministerio de Ciencia, Innovación y Universidades project DocTIUM (Ref. RTI2018-095645-B-C22), by the BBVA Foundation through the 2019 Digital Humanities research grant 'HistWeather ' Dos Siglos de Datos Climáticos' and by the Agencia Estatal de Investigación under project SimancasSearch (PID2020-116813RB-I00).Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.Quirós, L.; Vidal, E. (2022). Reading order detection on handwritten documents. Neural Computing and Applications. 34(12):9593-9611. https://doi.org/10.1007/s00521-022-06948-5959396113412Ares Oliveira S, Seguin B, Kaplan F (2018) dhSegment: A generic deep-learning approach for document segmentation. CoRR abs/1804.10371Bluche T (2015) Deep neural networks for large vocabulary handwritten text recognition. Ph.D. thesis, Ecole Doctorale Informatique de Paris-Sud - Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur . Discipline : informatiqueBreuel TM (2003) High performance document layout analysis. In: 2003 Symposium on document image understanding (SDIUT’03)Coquenet D, Soullard Y, Chatelain C, Paquet T (2019) Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition? In: 2019 International conference on document analysis and recognition workshops (ICDARW), pp. 65–70. IEEE, Sydney, AustraliaDavey BA, Priestley HA (1990) Introduction to lattices and order. Cambridge University Press, CambridgeGraves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Transact Pattern Anal Mach Intell 31(5):855–868Grüning T, Leifert G, Strauß T, Labahn R (2018) A Two-Stage Method for Text Line Detection in Historical Documents. CoRR abs/1802.03345Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations (ICLR)Kumar R, Vassilvitskii S (2010) Generalized distances between rankings. pp. 571–580. https://doi.org/10.1145/1772690.1772749Lee JY, Park JS, Byun H, Moon J, Lee SW (2002) Automatic generation of structured hyperdocuments from document images. Pattern Recognition 35(2):485–503Malerba D, Ceci M, Berardi M (2008) Machine learning for reading order detection in document image understanding. In: Machine learning in document analysis and recognition, pp. 45–69. Springer, Berlin and HeidelbergMartínek J, Lenc L, Král P (2020) Building an efficient OCR system for historical documents with little training data. Neural Comput Applicat 32(23):17209–17227Naoum A, Nothman J, Curran J (2019) Article segmentation in digitised newspapers with a 2d markov model. In: 2019 International conference on document analysis and recognition (ICDAR), pp. 1007–1014Pastor M (2019) Text baseline detection, a single page trained system. Pattern Recognit 94:149–161Prasad A, Déjean H, Meunier J (2019) Versatile layout understanding via conjugate graph. In: 2019 International conference on document analysis and recognition (ICDAR), pp. 287–294Puigcerver J (2018) A probabilistic formulation of keyword spotting. Ph.D. thesis, Univ. Politècnica de ValènciaQuirós L (2018) Multi-task handwritten document layout analysis. CoRR abs/1806.08852Quirós L, Vidal E (2021) Learning to sort handwritten text lines in reading order through estimated binary order relations. In: 2020 25th International conference on pattern recognition (ICPR). In pressRomero V, Serrano N, Toselli AH, Sánchez JAn, Vidal E (2011) Handwritten text recognition for historical documents. In: Proc. of the Workshop on language technologies for digital humanities and cultural heritage, pp. 90–96. Hissar, BulgariaRumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536Sánchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2019) A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit 94:122–134Toselli AH, Vidal E, Romero V, Frinken V (2016) HMM word graph based keyword spotting in handwritten document images. Information Sci 370–371:497–51
    corecore