38 research outputs found

    READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

    Full text link
    Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level. Producing ground truth by these means is laborious and not needed to determine a method's quality. In this paper we propose a new evaluation scheme that is based on baselines. The proposed scheme has no need for binarization and it can handle skewed as well as rotated text lines. The ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts used this evaluation scheme. Finally, we present results achieved by a recently published text line detection algorithm.Comment: Submitted to DAS201

    Transforming scholarship in the archives through handwritten text recognition:Transkribus as a case study

    Get PDF
    Purpose: An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues. - Design/methodology/approach: This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material. - Findings: Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified. - Research limitations/implications: The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc. - Practical implications: Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field. - Social implications: The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals. - Originality/value: This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector

    Kernels of Minimum Size Gossip Schemes

    Get PDF
    The main part of gossip schemes are the kernels of their minimal orders. We give a complete characterization of all kernels that may appear in gossip schemes on simple graphs with a minimum number of calls. As consequences we prove several results on gossip schemes, e.g. the minimum number of rounds of a gossip scheme with a minimum number of calls is computed. Moreover, in the new context we give proofs of known results, e.g. the well-known Four-Cycle-Theorem. In the last part, we deal with order theoretic questions for such kernel posets. After describing all p-grid-kernels in terms of permutations and subsets, isomorphism is investigated and they are enumerated. Then we compute the order dimension and the jump number of all possible kernels, and finally, we show how to determine the numbers of their linear extensions. Research supported by a special grant of the Alexander-von-Humboldt foundation, and SFB 303 (DFG), Institute of Discrete Mathematics/ Operations Research, University..
    corecore