3 research outputs found

    Distinction between handwritten and machine-printed text based on the bag of visual words model

    No full text
    In a variety of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may coexist in the same document image, raising significant issues within the recognition pipeline. It is, therefore, necessary to separate the two types of text so that it becomes feasible to apply different recognition methodologies to each modality. In this paper, a new approach is proposed which strives towards identifying and separating handwritten from machine printed text using the Bag of Visual Words model (BoVW). Initially, blocks of interest are detected in the document image. For each block, a descriptor is calculated based on the BoVW. The final characterization of the blocks as Handwritten, Machine Printed or Noise is made by a decision scheme which relies upon the combination of binary SVM classifiers. The promising performance of the proposed approach is shown by using a consistent evaluation methodology which couples meaningful measures along with new datasets dedicated to the problem upon consideration

    Transforming Scholarship in the Archives Through Handwritten Text Recognition: Transkribus as a Case Study

    No full text
    This paper uses the Transkribus platform as a case study, focusing on the development, application and impact of HTR technology. It demonstrates that HTR has the capacity to make a significant contribution to the archival mission by making it easier for anyone to read, transcribe, process and mine historical documents. It shows that the technology fits neatly into the archival workflow, making direct use of growing repositories of digitised images of historical texts. By providing examples of institutions and researchers who are generating new resources with Transkribus, the paper shows how HTR can extend the existing research infrastructure of the archives, libraries and humanities domain. Looking to the future, this paper argues that this form of machine learning has the potential to change the nature and scope of historical research. Finally, it suggests that a cooperative approach from the archives, library and humanities community is the best way to support and sustain the benefits of the technology offered through Transkribus
    corecore