Word level script identification for scanned document images

David Doermann; Huanfeng Ma

Word level script identification for scanned document images

Authors: David Doermann
Huanfeng Ma
Publication date
Publisher

Abstract

In this paper, we compare the performance of three classifiers used to identify the script of words in scanned document images. In both training and testing, a Gabor filter is applied and 16 channels of features are extracted. Three classifiers (Support Vector Machines (SVM), Gaussian Mixture Model (GMM) and k-Nearest-Neighbor (k-NN)) are used to identify different scripts at the word level (glyphs separated by white space). These three classifiers are applied to a variety of bilingual dictionaries and their performance is compared. Experimental results show the capability of Gabor filter to capture script features and the effectiveness of these three classifiers for script identification at the word level

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.86.63...

Last time updated on 22/10/2014