Search CORE

10 research outputs found

Automatic Ground-truth Generation for Document Image Analysis and Understanding

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

An Open Architecture for End-to-End Document Analysis Benchmarking

Author: Lamiroy Bart
Lopresti Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2011
Field of study

ISBN: 978-1-4577-1350-7International audienceIn this paper we present a fully operational, scalable and open architecture allowing to perform end-to-end document analysis benchmarking without needing to develop the whole pipeline. By decomposing the whole analysis process into coarse grained tasks, and by building upon community provided state-of-the art algorithms, our architecture allows virtually any combination of elementary document analysis algorithms, regardless their running system environment, programming language or data structures. Its ﬂexible structure makes it very straightforward to plug in new experimental algorithms, compare them to equivalent other algorithms, and observe its effects on end-to-end tasks without need to install, compile or otherwise interact with any other software than one's own

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Arabic Character Recognition using 1-D slices of the Character Spectrum

Author: Ashebeili S.
Mahmoud S.
Nabawi A.
Publication venue
Publication date
Field of study

KFUPM ePrints

Arabic Database for Automatic Printed Arabic Text Recognition Research and Benchmarking

Author: Al-Hashim Amin Ghalib S.
Publication venue
Publication date: 15/06/2009
Field of study

KFUPM ePrints

Improving Digital Library Support for Historic Newspaper Collections

Author: Lin Leo
Publication venue: The University of Waikato
Publication date: 01/01/2009
Field of study

DVD-ROM Appendix available with the print copy of this thesis.National and international initiatives are underway around the globe to digitise the vast treasure troves of historical artefacts they contain and make them available as digital libraries (DLs). The developed DLs are often constructed from facsimile pages with pre-existing metadata, such as historic newspapers stored on microfiche or generated from the non-destructive scanning of precious manuscripts. Access to the source documents is therefore limited to methods constructed from the metadata. Other projects look to introduce full-text indexing through the application of off-the-shelf commercial Optical Character Recognition (OCR) software. While this has greater potential for the end user experience over the metadata-only versions, the approach currently taken is best effort in the time available rather than a process informed by detailed analysis of the issues. In this thesis, we investigate if a richer level of support and service can be achieved by more closely integrating image processing techniques with DL software. The thesis presents a variety of experiments, implemented within the recently published open-source OCR System (Ocropus). In particular, existing segmentation algorithms are compared against our own based on Hough Transform, using our own created corpus gathered from different major online digital historic newspaper archives

Research Commons@Waikato

Proceedings of the 2003 APRS Workshop on Digital Image Computing

Author
Publication venue
Publication date: 01/01/2003
Field of study

University of Queensland eSpace