Search CORE

4 research outputs found

The Robust Reading Competition Annotation and Evaluation Platform

Author: Gómez Lluis
Karatzas Dimosthenis
Nicolaou Anguelos
Rusiñol Marçal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/05/2018
Field of study

The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms. Concurrent with its second incarnation in 2011, a continuous effort started to develop an on-line framework to facilitate the hosting and management of competitions. This paper outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the competitions. The RRC Annotation and Evaluation Platform is a modular framework, fully accessible through on-line interfaces. It comprises a collection of tools and services for managing all processes involved with defining and evaluating a research task, from dataset definition to annotation management, evaluation specification and results analysis. Although the framework has been designed with robust reading research in mind, many of the provided tools are generic by design. All aspects of the RRC Annotation and Evaluation Framework are available for research use.Comment: 6 pages, accepted to DAS 201

arXiv.org e-Print Archive

Crossref

ZoneMapAlt: An alternative to the ZoneMap metric for zone segmentation and classification

Author: Belaid Abdel
Karpinski Romain
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/04/2018
Field of study

International audienceThis paper proposes a new evaluation metric based on the existing ZoneMap metric. The ZoneMap method, designed to perform a zone segmentation evaluation and classification, is considered in the context of OCR evaluation. Its limits are spotted, described and a new algorithm, ZoneMapAlt (ZoneMap Alternative) is proposed to solve the identified limits while keeping the properties of the original one. To validate the new metric, experiments have been made on a dataset of scientific articles. Results demonstrate that the ZoneMapAlt algorithm provides greater details on seg-mentation errors and is able to detect critical segmentation errors

Crossref

INRIA a CCSD electronic archive server

Metrics for Complete Evaluation of OCR Performance

Author: Belaid Abdel
Karpinski Romain
Lohani Devashish
Publication venue: HAL CCSD
Publication date: 30/07/2018
Field of study

International audienceIn this paper, we study metrics for evaluating OCR performance both in terms of physical segmentation and in terms of textual content recognition. These metrics rely on the OCR output (hypothesis) and the reference (also called ground truth) input format. Two evaluation criteria are considered: the quality of segmentation and the character recognition rate. Three pairs of input formats are selected among two types of inputs: text only (text) and text with spatial information (xml). These pairs of inputs reference-to-hypothesis are: 1) text-to-text, 2) xml-to-xml and 3) text-to-xml. For the text-to-text pair, we selected the RETAS method to perform experiments and show its limits. Regarding text-to-xml, a new method based on unique word anchors is proposed to solve the problem of aligning texts with different information. We define the ZoneMapAltCnt metric for the xml-to-xml approach and show that it offers the most reliable and complete evaluation compared to the other two. Open source OCRs like Tesseract and OCRopus are selected to perform experiments. The datasets used are a collection of documents from the ISTEX 1 document database, from French newspaper "Le Nouvel Observateur" as well as invoices and administrative document gathered from different collaborations

INRIA a CCSD electronic archive server