4 research outputs found

    The Robust Reading Competition Annotation and Evaluation Platform

    Full text link
    The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms. Concurrent with its second incarnation in 2011, a continuous effort started to develop an on-line framework to facilitate the hosting and management of competitions. This paper outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the competitions. The RRC Annotation and Evaluation Platform is a modular framework, fully accessible through on-line interfaces. It comprises a collection of tools and services for managing all processes involved with defining and evaluating a research task, from dataset definition to annotation management, evaluation specification and results analysis. Although the framework has been designed with robust reading research in mind, many of the provided tools are generic by design. All aspects of the RRC Annotation and Evaluation Framework are available for research use.Comment: 6 pages, accepted to DAS 201

    ZoneMapAlt: An alternative to the ZoneMap metric for zone segmentation and classification

    Get PDF
    International audienceThis paper proposes a new evaluation metric based on the existing ZoneMap metric. The ZoneMap method, designed to perform a zone segmentation evaluation and classification, is considered in the context of OCR evaluation. Its limits are spotted, described and a new algorithm, ZoneMapAlt (ZoneMap Alternative) is proposed to solve the identified limits while keeping the properties of the original one. To validate the new metric, experiments have been made on a dataset of scientific articles. Results demonstrate that the ZoneMapAlt algorithm provides greater details on seg-mentation errors and is able to detect critical segmentation errors

    Metrics for Complete Evaluation of OCR Performance

    Get PDF
    International audienceIn this paper, we study metrics for evaluating OCR performance both in terms of physical segmentation and in terms of textual content recognition. These metrics rely on the OCR output (hypothesis) and the reference (also called ground truth) input format. Two evaluation criteria are considered: the quality of segmentation and the character recognition rate. Three pairs of input formats are selected among two types of inputs: text only (text) and text with spatial information (xml). These pairs of inputs reference-to-hypothesis are: 1) text-to-text, 2) xml-to-xml and 3) text-to-xml. For the text-to-text pair, we selected the RETAS method to perform experiments and show its limits. Regarding text-to-xml, a new method based on unique word anchors is proposed to solve the problem of aligning texts with different information. We define the ZoneMapAltCnt metric for the xml-to-xml approach and show that it offers the most reliable and complete evaluation compared to the other two. Open source OCRs like Tesseract and OCRopus are selected to perform experiments. The datasets used are a collection of documents from the ISTEX 1 document database, from French newspaper "Le Nouvel Observateur" as well as invoices and administrative document gathered from different collaborations
    corecore