Analysis and and Character Character Recognition Recognition

Abstract

We describe a computer database being developed at the University of Nevada, Las Vegas to support experiments in the recognition and analysis of information from printed documents. The history and economic significance of the database are discussed. It is a page-oriented database of mostly technical documents. Approximately 9300 pages are currently on-line. Methods of access are described. A set of software tools has been developed which automate much of the drudgery of performing experiments with optical character recognition (OCR) systems. UNLV plans to encourage each succeeding researcher to add value to the database. The authors believe that GT1 will become an increasingly valuable standard for evaluating systems and an important tool for research in document analysis. At the same time, the experimental tools described can be utilized to automate experiments with new ground-truth databases as they are added

    Similar works

    Full text

    thumbnail-image

    Available Versions