7 research outputs found

    A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

    Full text link
    Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval

    Exploring Medieval Manuscripts Writer Predictability: A Study on Scribe and Letter Identification

    Get PDF
    Handwriting communication is a long-established human activity that has survived into the 21st century. Accordingly, research interest in handwritten documents, both historical and modern, is significant. The way we write has changed significantly over the past few centuries. For example, texts of the Middle Ages were often written and copied by anonymous scribes. The writing of each scribe, known as his/her "scribal hand" is unique. It can be differentiated using a variety of consciously and unconsciously produced features. Distinguishing between these different scribal hands is a central focus of the humanities research field known as "paleography." Character recognition within each scribal hand has also posed an interesting challenge. Some issues make these digital processes difficult, such as paper degradation and the soiling of the manuscript page. Thus, in this paper, we propose an investigation in both perspectives, character recognition and writer identification, in medieval manuscripts to better understand the specific behaviour of two 800-year-old scribes based on their manuscripts in comparison with a modern calligrapher. The experiments demonstrated that degradation and tremor can influence the analysis of medieval handwriting documents. However, the results presented an efficient accuracy with a better accuracy rate in letter classification than in writer identification

    Jewish Studies in the Digital Age

    Get PDF
    The digitisation boom of the last two decades, and the rapid advancement of digital tools to analyse data in myriad ways, have opened up new avenues for humanities research. This volume discusses how the so-called digital turn has affected the field of Jewish Studies, explores the current state of the art and probes how digital developments can be harnessed to address the specific questions, challenges and problems in the field

    Jewish Studies in the Digital Age

    Get PDF
    The digitisation boom of the last two decades, and the rapid advancement of digital tools to analyse data in myriad ways, have opened up new avenues for humanities research. This volume discusses how the so-called digital turn has affected the field of Jewish Studies, explores the current state of the art and probes how digital developments can be harnessed to address the specific questions, challenges and problems in the field
    corecore