16 research outputs found

    A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

    Full text link
    Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval

    Unsupervised feature learning for writer identification

    Get PDF
    Our work presents a research on unsupervised feature learning methods for writer identification and retrieval. We want to study the impact of deep learning alternatives in this field by proposing methodologies which explore different uses of autoencoder networks. Taking a patch extraction algorithm as a starting point, we aim to obtain characteristics from patches of handwritten documents in an unsupervised way, meaning no label information is used for the task. To prove if the extraction of features is valid for writer identification, the approaches we propose are evaluated and compared with state-of-the-art methods on the ICDAR2013 and ICDAR2017 datasets for writer identification

    Feature Mixing for Writer Retrieval and Identification on Papyri Fragments

    Full text link
    This paper proposes a deep-learning-based approach to writer retrieval and identification for papyri, with a focus on identifying fragments associated with a specific writer and those corresponding to the same image. We present a novel neural network architecture that combines a residual backbone with a feature mixing stage to improve retrieval performance, and the final descriptor is derived from a projection layer. The methodology is evaluated on two benchmarks: PapyRow, where we achieve a mAP of 26.6 % and 24.9 % on writer and page retrieval, and HisFragIR20, showing state-of-the-art performance (44.0 % and 29.3 % mAP). Furthermore, our network has an accuracy of 28.7 % for writer identification. Additionally, we conduct experiments on the influence of two binarization techniques on fragments and show that binarizing does not enhance performance. Our code and models are available to the community.Comment: accepted for HIP@ICDAR202

    DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

    Full text link
    We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.Comment: Submitted at the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 6 pages, 6 Figure

    Re-ranking for Writer Identification and Writer Retrieval

    Full text link
    Automatic writer identification is a common problem in document analysis. State-of-the-art methods typically focus on the feature extraction step with traditional or deep-learning-based techniques. In retrieval problems, re-ranking is a commonly used technique to improve the results. Re-ranking refines an initial ranking result by using the knowledge contained in the ranked result, e. g., by exploiting nearest neighbor relations. To the best of our knowledge, re-ranking has not been used for writer identification/retrieval. A possible reason might be that publicly available benchmark datasets contain only few samples per writer which makes a re-ranking less promising. We show that a re-ranking step based on k-reciprocal nearest neighbor relationships is advantageous for writer identification, even if only a few samples per writer are available. We use these reciprocal relationships in two ways: encode them into new vectors, as originally proposed, or integrate them in terms of query-expansion. We show that both techniques outperform the baseline results in terms of mAP on three writer identification datasets

    Advances in Document Layout Analysis

    Full text link
    [EN] Handwritten Text Segmentation (HTS) is a task within the Document Layout Analysis field that aims to detect and extract the different page regions of interest found in handwritten documents. HTS remains an active topic, that has gained importance with the years, due to the increasing demand to provide textual access to the myriads of handwritten document collections held by archives and libraries. This thesis considers HTS as a task that must be tackled in two specialized phases: detection and extraction. We see the detection phase fundamentally as a recognition problem that yields the vertical positions of each region of interest as a by-product. The extraction phase consists in calculating the best contour coordinates of the region using the position information provided by the detection phase. Our proposed detection approach allows us to attack both higher level regions: paragraphs, diagrams, etc., and lower level regions like text lines. In the case of text line detection we model the problem to ensure that the system's yielded vertical position approximates the fictitious line that connects the lower part of the grapheme bodies in a text line, commonly known as the baseline. One of the main contributions of this thesis, is that the proposed modelling approach allows us to include prior information regarding the layout of the documents being processed. This is performed via a Vertical Layout Model (VLM). We develop a Hidden Markov Model (HMM) based framework to tackle both region detection and classification as an integrated task and study the performance and ease of use of the proposed approach in many corpora. We review the modelling simplicity of our approach to process regions at different levels of information: text lines, paragraphs, titles, etc. We study the impact of adding deterministic and/or probabilistic prior information and restrictions via the VLM that our approach provides. Having a separate phase that accurately yields the detection position (base- lines in the case of text lines) of each region greatly simplifies the problem that must be tackled during the extraction phase. In this thesis we propose to use a distance map that takes into consideration the grey-scale information in the image. This allows us to yield extraction frontiers which are equidistant to the adjacent text regions. We study how our approach escalates its accuracy proportionally to the quality of the provided detection vertical position. Our extraction approach gives near perfect results when human reviewed baselines are provided.[ES] La Segmentaci贸n de Texto Manuscrito (STM) es una tarea dentro del campo de investigaci贸n de An谩lisis de Estructura de Documentos (AED) que tiene como objetivo detectar y extraer las diferentes regiones de inter茅s de las p谩ginas que se encuentran en documentos manuscritos. La STM es un tema de investigaci贸n activo que ha ganado importancia con los a帽os debido a la creciente demanda de proporcionar acceso textual a las miles de colecciones de documentos manuscritos que se conservan en archivos y bibliotecas. Esta tesis entiende la STM como una tarea que debe ser abordada en dos fases especializadas: detecci贸n y extracci贸n. Consideramos que la fase de detecci贸n es, fundamentalmente, un problema de clasificaci贸n cuyo subproducto son las posiciones verticales de cada regi贸n de inter茅s. Por su parte, la fase de extracci贸n consiste en calcular las mejores coordenadas de contorno de la regi贸n utilizando la informaci贸n de posici贸n proporcionada por la fase de detecci贸n. Nuestro enfoque de detecci贸n nos permite atacar tanto regiones de alto nivel (p谩rrafos, diagramas驴) como regiones de nivel bajo (l铆neas de texto principalmente). En el caso de la detecci贸n de l铆neas de texto, modelamos el problema para asegurar que la posici贸n vertical estimada por el sistema se aproxime a la l铆nea ficticia que conecta la parte inferior de los cuerpos de los grafemas en una l铆nea de texto, com煤nmente conocida como l铆nea base. Una de las principales aportaciones de esta tesis es que el enfoque de modelizaci贸n propuesto nos permite incluir informaci贸n conocida a priori sobre la disposici贸n de los documentos que se est谩n procesando. Esto se realiza mediante un Modelo de Estructura Vertical (MEV). Desarrollamos un marco de trabajo basado en los Modelos Ocultos de Markov (MOM) para abordar tanto la detecci贸n de regiones como su clasificaci贸n de forma integrada, as铆 como para estudiar el rendimiento y la facilidad de uso del enfoque propuesto en numerosos corpus. As铆 mismo, revisamos la simplicidad del modelado de nuestro enfoque para procesar regiones en diferentes niveles de informaci贸n: l铆neas de texto, p谩rrafos, t铆tulos, etc. Finalmente, estudiamos el impacto de a帽adir informaci贸n y restricciones previas deterministas o probabilistas a trav茅s de el MEV propuesto que nuestro enfoque proporciona. Disponer de un m茅todo independiente que obtiene con precisi贸n la posici贸n de cada regi贸n detectada (l铆neas base en el caso de las l铆neas de texto) simplifica enormemente el problema que debe abordarse durante la fase de extracci贸n. En esta tesis proponemos utilizar un mapa de distancias que tiene en cuenta la informaci贸n de escala de grises de la imagen. Esto nos permite obtener fronteras de extracci贸n que son equidistantes a las regiones de texto adyacentes. Estudiamos como nuestro enfoque aumenta su precisi贸n de manera proporcional a la calidad de la detecci贸n y descubrimos que da resultados casi perfectos cuando se le proporcionan l铆neas de base revisadas por humanos.[CA] La Segmentaci贸 de Text Manuscrit (STM) 茅s una tasca dins del camp d'investigaci贸 d'An脿lisi d'Estructura de Documents (AED) que t茅 com a objectiu detectar I extraure les diferents regions d'inter猫s de les p脿gines que es troben en documents manuscrits. La STM 茅s un tema d'investigaci贸 actiu que ha guanyat import脿ncia amb els anys a causa de la creixent demanda per proporcionar acc茅s textual als milers de col路leccions de documents manuscrits que es conserven en arxius i biblioteques. Aquesta tesi ent茅n la STM com una tasca que ha de ser abordada en dues fases especialitzades: detecci贸 i extracci贸. Considerem que la fase de detecci贸 茅s, fonamentalment, un problema de classificaci贸 el subproducte de la qual s贸n les posicions verticals de cada regi贸 d'inter猫s. Per la seva part, la fase d'extracci贸 consisteix a calcular les millors coordenades de contorn de la regi贸 utilitzant la informaci贸 de posici贸 proporcionada per la fase de detecci贸. El nostre enfocament de detecci贸 ens permet atacar tant regions d'alt nivell (par脿grafs, diagrames ...) com regions de nivell baix (l铆nies de text principalment). En el cas de la detecci贸 de l铆nies de text, modelem el problema per a assegurar que la posici贸 vertical estimada pel sistema s'aproximi a la l铆nia fict铆cia que connecta la part inferior dels cossos dels grafemes en una l铆nia de text, comunament coneguda com a l铆nia base. Una de les principals aportacions d'aquesta tesi 茅s que l'enfocament de modelitzaci贸 proposat ens permet incloure informaci贸 coneguda a priori sobre la disposici贸 dels documents que s'estan processant. Aix貌 es realitza mitjan莽ant un Model d'Estructura Vertical (MEV). Desenvolupem un marc de treball basat en els Models Ocults de Markov (MOM) per a abordar tant la detecci贸 de regions com la seva classificaci贸 de forma integrada, aix铆 com per a estudiar el rendiment i la facilitat d'煤s de l'enfocament proposat en nombrosos corpus. Aix铆 mateix, revisem la simplicitat del modelatge del nostre enfocament per a processar regions en diferents nivells d'informaci贸: l铆nies de text, par脿grafs, t铆tols, etc. Finalment, estudiem l'impacte d'afegir informaci贸 i restriccions pr猫vies deterministes o probabilistes a trav茅s del MEV que el nostre m猫tode proporciona. Disposar d'un m猫tode independent que obt茅 amb precisi贸 la posici贸 de cada regi贸 detectada (l铆nies base en el cas de les l铆nies de text) simplifica enormement el problema que ha d'abordar-se durant la fase d'extracci贸. En aquesta tesi proposem utilitzar un mapa de dist脿ncies que t茅 en compte la informaci贸 d'escala de grisos de la imatge. Aix貌 ens permet obtenir fronteres d'extracci贸 que s贸n equidistants de les regions de text adjacents. Estudiem com el nostre enfocament augmenta la seva precisi贸 de manera proporcional a la qualitat de la detecci贸 i descobrim que dona resultats quasi perfectes quan se li proporcionen l铆nies de base revisades per humans.Bosch Campos, V. (2020). Advances in Document Layout Analysis [Tesis doctoral no publicada]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/138397TESI

    Layout Analysis for Handwritten Documents. A Probabilistic Machine Learning Approach

    Full text link
    [ES] El An谩lisis de la Estructura de Documentos (Document Layout Analysis), aplicado a documentos manuscritos, tiene como objetivo obtener autom谩ticamente la estructura intr铆nseca de dichos documentos. Su desarrollo como campo de investigaci贸n se extiende desde los sistemas de segmentaci贸n de caracteres desarrollados a principios de la d茅cada de 1960 hasta los sistemas complejos desarrollados en la actualidad, donde el objetivo es analizar estructuras de alto nivel (l铆neas de texto, p谩rrafos, tablas, etc.) y la relaci贸n que existe entre ellas. Esta tesis, en primer lugar, define el objetivo del An谩lisis de la Estructura de Documentos desde una perspectiva probabil铆stica. A continuaci贸n, la complejidad del problema se reduce a un conjunto de subproblemas complementarios bien conocidos, de manera que pueda ser gestionado por medio de recursos inform谩ticos modernos. Concretamente se abordan tres de los principales problemas del An谩lisis de la Estructura de Documentos siguiendo una formulaci贸n probabil铆stica. Espec铆ficamente se aborda la Detecci贸n de L铆nea Base (Baseline Detection), la Segmentaci贸n de Regiones (Region Segmentation) y la Determinaci贸n del Orden de Lectura (Reading Order Determination). Uno de los principales aportes de esta tesis es la formalizaci贸n de los problemas de Detecci贸n de L铆nea Base y Segmentaci贸n de Regiones bajo un marco probabil铆stico, donde ambos problemas pueden ser abordados por separado o de forma integrada por los modelos propuestos. Este 煤ltimo enfoque ha demostrado ser muy 煤til para procesar grandes colecciones de documentos con recursos inform谩ticos limitados. Posteriormente se aborda el subproblema de la Determinaci贸n del Orden de Lectura, que es uno de los subproblemas m谩s importantes, aunque subestimados, del An谩lisis de la Extructura de Documentos, ya que es el nexo que permite convertir los datos extra铆dos de los sistemas de Reconocimiento Autom谩tico de Texto (Automatic Text Recognition Systems) en informaci贸n 煤til. Por lo tanto, en esta tesis abordamos y formalizamos la Determinaci贸n del Orden de Lectura como un problema de clasificaci贸n probabil铆stica por pares. Adem谩s, se proponen dos diferentes algoritmos de decodificaci贸n que reducen la complejidad computacional del problema. Por otra parte, se utilizan diferentes modelos estad铆sticos para representar la distribuci贸n de probabilidad sobre la estructura de los documentos. Estos modelos, basados en Redes Neuronales Artificiales (desde un simple Perceptr贸n Multicapa hasta complejas Redes Convolucionales y Redes de Propuesta de Regiones), se estiman a partir de datos de entrenamiento utilizando algoritmos de aprendizaje autom谩tico supervisados. Finalmente, todas las contribuciones se eval煤an experimentalmente, no solo en referencias acad茅micas est谩ndar, sino tambi茅n en colecciones de miles de im谩genes. Se han considerado documentos de texto manuascritos y documentos musicales manuscritos, ya que en conjunto representan la mayor铆a de los documentos presentes en bibliotecas y archivos. Los resultados muestran que los m茅todos propuestos son muy precisos y vers谩tiles en una amplia gama de documentos manuscritos.[CA] L'An脿lisi de l'Estructura de Documents (Document Layout Analysis), aplicada a documents manuscrits, pret茅n automatitzar l'obtenci贸 de l'estructura intr铆nseca d'un document. El seu desenvolupament com a camp d'investigaci贸 compr茅n des dels sistemes de segmentaci贸 de car脿cters creats al principi dels anys 60 fins als complexos sistemes de hui dia que busquen analitzar estructures d'alt nivell (l铆nies de text, par脿grafs, taules, etc) i les relacions entre elles. Aquesta tesi busca, primer de tot, definir el prop貌sit de l'an脿lisi de l'estructura de documents des d'una perspectiva probabil铆stica. Llavors, una vegada redu茂da la complexitat del problema, es processa utilitzant recursos computacionals moderns, per a dividir-ho en un conjunt de subproblemes complementaris m茅s coneguts. Concretament, tres dels principals subproblemes de l'An脿lisi de l'Estructura de Documents s'adrecen seguint una formulaci贸 probabil铆stica: Detecci贸 de la L铆nia Base Baseline Detection), Segmentaci贸 de Regions (Region Segmentation) i Determinaci贸 de l'Ordre de Lectura (Reading Order Determination). Una de les principals contribucions d'aquesta tesi 茅s la formalitzaci贸 dels problemes de la Detecci贸 de les L铆nies Base i dels de Segmentaci贸 de Regions en un entorn probabil铆stic, sent els dos problemes tractats per separat o integrats en conjunt pels models proposats. Aquesta 煤ltima aproximaci贸 ha demostrat ser de molta utilitat per a la gesti贸 de grans col路leccions de documents amb uns recursos computacionals limitats. Posteriorment s'ha adre莽at el subproblema de la Determinaci贸 de l'Ordre de Lectura, sent un dels subproblemes m茅s importants de l'An脿lisi d'Estructures de Documents, encara aix铆 subestimat, perqu猫 茅s el nexe que permet transformar en informaci贸 d'utilitat l'extracci贸 de dades dels sistemes de reconeixement autom脿tic de text. 脡s per aix貌 que el fet de determinar l'ordre de lectura s'adre莽a i formalitza com un problema d'ordenaci贸 probabil铆stica per parells. A m茅s, es proposen dos algoritmes descodificadors diferents que reducix la complexitat computacional del problema. Per altra banda s'utilitzen diferents models estad铆stics per representar la distribuci贸 probabil铆stica sobre l'estructura dels documents. Aquests models, basats en xarxes neuronals artificials (des d'un simple perceptron multicapa fins a complexes xarxes convolucionals i de propostes de regi贸), s'estimen a partir de dades d'entrenament mitjan莽ant algoritmes d'aprenentatge autom脿tic supervisats. Finalment, totes les contribucions s'avaluen experimentalment, no nom茅s en referents acad猫mics est脿ndard, sin贸 tamb茅 en col路leccions de milers d'imatges. S'han considerat documents de text manuscrit i documents musicals manuscrits, ja que representen la majoria de documents presents a biblioteques i arxius. Els resultats mostren que els m猫todes proposats s贸n molt precisos i vers脿tils en una 脿mplia gamma de documents manuscrits.[EN] Document Layout Analysis, applied to handwritten documents, aims to automatically obtain the intrinsic structure of a document. Its development as a research field spans from the character segmentation systems developed in the early 1960s to the complex systems designed nowadays, where the goal is to analyze high-level structures (lines of text, paragraphs, tables, etc) and the relationship between them. This thesis first defines the goal of Document Layout Analysis from a probabilistic perspective. Then, the complexity of the problem is reduced, to be handled by modern computing resources, into a set of well-known complementary subproblems. More precisely, three of the main subproblems of Document Layout Analysis are addressed following a probabilistic formulation, namely Baseline Detection, Region Segmentation and Reading Order Determination. One of the main contributions of this thesis is the formalization of Baseline Detection and Region Segmentation problems under a probabilistic framework, where both problems can be handled separately or in an integrated way by the proposed models. The latter approach is proven to be very useful to handle large document collections under restricted computing resources. Later, the Reading Order Determination subproblem is addressed. It is one of the most important, yet underestimated, subproblem of Document Layout Analysis, since it is the bridge that allows us to convert the data extracted from Automatic Text Recognition systems into useful information. Therefore, Reading Order Determination is addressed and formalized as a pairwise probabilistic sorting problem. Moreover, we propose two different decoding algorithms that reduce the computational complexity of the problem. Furthermore, different statistical models are used to represent the probability distribution over the structure of the documents. These models, based on Artificial Neural Networks (from a simple Multilayer Perceptron to complex Convolutional and Region Proposal Networks), are estimated from training data using supervised Machine Learning algorithms. Finally, all the contributions are experimentally evaluated, not only on standard academic benchmarks but also in collections of thousands of images. We consider handwritten text documents and handwritten musical documents as they represent the majority of documents in libraries and archives. The results show that the proposed methods are very accurate and versatile in a very wide range of handwritten documents.Quir贸s D铆az, L. (2022). Layout Analysis for Handwritten Documents. A Probabilistic Machine Learning Approach [Tesis doctoral]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/18148
    corecore