6 research outputs found

    A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

    Full text link
    Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval

    DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

    Full text link
    We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.Comment: Submitted at the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 6 pages, 6 Figure

    Ein Ontologie-Entwurf für die Klassifikation von historischen Wasserzeichen

    Get PDF
    Die Klassifikation von historischen Wasserzeichen spielt sowohl für die kodikologische Forschung als auch für Bibliotheken und Archive als datenverwaltende Institutionen eine zentrale Rolle. Die existierenden Wasserzeichen-Datenbanken arbeiten mit hierarchischen Klassifikationssystematiken, die der Komplexität des historischen Materials oft nicht gerecht werden. Aus diesem Grund wird im vorliegenden Beitrag ein Entwurf einer Wasserzeichenklassifikation in der flexibleren Form der Ontologie präsentiert. Dieser erlaubt nicht nur eine differenziertere Erfassung von und Suche nach Wasserzeichen, sondern kann perspektivisch auch in andere Portale integriert und für Digital Humanities-Forschungsszenarien nachgenutzt werden. The classification of historical watermarks is an important issue for codicological research as well as for libraries and archives providing the data. The existing watermark databases work with hierarchical classification systems, which often do not meet the complexity of the historical material. For this reason, this contribution presents a draft of an ontology, which is a more flexible classification system that allows for modelling and searching the data in a more differentiated way. Furthermore, the ontology can be integrated into other portals and be reused by other digital humanities‘ research projects

    Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning

    No full text
    In this paper, we tackle the problem of categorizing and identifying cross-depicted historical motifs using recent deep learning techniques, with aim of developing a content-based image retrieval system. As cross-depiction, we understand the problem that the same object can be represented (depicted) in various ways. The objects of interest in this research are watermarks, which are crucial for dating manuscripts. For watermarks, cross-depiction arises due to two reasons: (i) there are many similar representations of the same motif, and (ii) there are several ways of capturing the watermarks, i.e., as the watermarks are not visible on a scan or photograph, the watermarks are typically retrieved via hand tracing, rubbing, or special photographic techniques. This leads to different representations of the same (or similar) objects, making it hard for pattern recognition methods to recognize the watermarks. While this is a simple problem for human experts, computer vision techniques have problems generalizing from the various depiction possibilities. In this paper, we present a study where we use deep neural networks for categorization of watermarks with varying levels of detail. The macro-averaged F1-score on an imbalanced 12 category classification task is 88.3 %, the multi-labelling performance (Jaccard Index) on a 622 label task is 79.5 %. To analyze the usefulness of an image-based system for assisting humanities scholars in cataloguing manuscripts, we also measure the performance of similarity matching on expert-crafted test sets of varying sizes (50 and 1000 watermark samples). A significant outcome is that all relevant results belonging to the same super-class are found by our system (Mean Average Precision of 100%), despite the cross-depicted nature of the motifs. This result has not been achieved in the literature so far
    corecore