Skip to main content
Article thumbnail
Location of Repository

Evaluation of approximate pattern matching algorithms for OCR texts

By Manolis Christodoulakis, Gerhard Brey and Rizwan Ahmed Uppal

Abstract

In recent years there has been going on a large process of digitising old books, articles and newspapers. These documents are scanned and then processed with Optical Character Recognition (OCR) software to obtain their text equivalent. However, due to the (usually) poor quality of the original papers, the OCR software produces text which is not 100% accurate. A simple search for a pattern in the resulting text would only retrieve those occurrences that were accurately interpreted, but will ignore incorrectly spelled or distorted variations. In this paper we make use of the recently devised algorithm by Christodoulakis and Brey (2008), on the edit distance with combinations and splits, to perform approximate pattern matching for OCR texts. We then compare its performance against classic generalpurpose approximate matching algorithms

Year: 2009
OAI identifier: oai:roar.uel.ac.uk:908

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.