Pergamon Press Ltd.
- Publication date
- Publisher
Abstract
Abstract--The automatic recognition of printed Farsi (Persian) texts is complicated by several properties of the Farsi script: (a) connectivity of symbols, (b) similarity of groups of symbols, (c) highly variable widths, (d) subword overlap, and (e) line overlap. In this paper, a technique for the automatic recognition of printed Farsi texts is presented and its steps are discussed as follows: (1) digitization, (2) editing, (3) line separation, (4) subword separation, (5) symbol separation, (6) recognition, and (7) postprocessing. The most notable contributions of this work are in algorithms for steps (5) and (6) above. Practical application of the technique to Farsi newspaper headlines has been 100 % successful. However, smaller type fonts, which could not be handled by the coarse digitization hardware used, will no doubt result in less than perfect recognition. The technique is also applicable with little or no modification to printed Arabic and Urdu texts which use the same alphabet as Farsi. Character recognition Computer input Document input Optical character recognition Pattern recognition Persian BACKGROUND Automatic recognition of printed or handwritten texts provides a convenient means of communication wit