Partitioning of the Degradation Space for OCR Training

Andersen, Tim; Barney Smith, Elisa H.

research

Partitioning of the Degradation Space for OCR Training

Authors: Tim Andersen
Elisa H. Barney Smith
Publication date: 16 January 2006
Publisher: 'IUScholarWorks'

Abstract

Generally speaking optical character recognition algorithms tend to perform better when presented with homogeneous data. This paper studies a method that is designed to increase the homogeneity of training data, based on an understanding of the types of degradations that occur during the printing and scanning process, and how these degradations affect the homogeneity of the data. While it has been shown that dividing the degradation space by edge spread improves recognition accuracy over dividing the degradation space by threshold or point spread function width alone, the challenge is in deciding how many partitions and at what value of edge spread the divisions should be made. Clustering of different types of character features, fonts, sizes, resolutions and noise levels shows that edge spread is indeed shown to be a strong indicator of the homogeneity of character data clusters

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Boise State University - ScholarWorks

oai:scholarworks.boisestate.ed...

Last time updated on 17/11/2016