2,807 research outputs found
Recognizing Degraded Handwritten Characters
In this paper, Slavonic manuscripts from the 11th
century written in Glagolitic script are
investigated. State-of-the-art optical character recognition methods produce poor results
for degraded handwritten document images. This is largely due to a lack of suitable
results from basic pre-processing steps such as binarization and image segmentation.
Therefore, a new, binarization-free approach will be presented that is independent of
pre-processing deficiencies. It additionally incorporates local information in order to
recognize also fragmented or faded characters. The proposed algorithm consists of
two steps: character classification and character localization. Firstly scale invariant
feature transform features are extracted and classified using support vector machines.
On this basis interest points are clustered according to their spatial information. Then,
characters are localized and eventually recognized by a weighted voting scheme of
pre-classified local descriptors. Preliminary results show that the proposed system can
handle highly degraded manuscript images with background noise, e.g. stains, tears,
and faded characters
A Comparative study of Arabic handwritten characters invariant feature
This paper is practically interested in the unchangeable feature of Arabic
handwritten character. It presents results of comparative study achieved on
certain features extraction techniques of handwritten character, based on Hough
transform, Fourier transform, Wavelet transform and Gabor Filter. Obtained
results show that Hough Transform and Gabor filter are insensible to the
rotation and translation, Fourier Transform is sensible to the rotation but
insensible to the translation, in contrast to Hough Transform and Gabor filter,
Wavelets Transform is sensitive to the rotation as well as to the translation
Query by String word spotting based on character bi-gram indexing
In this paper we propose a segmentation-free query by string word spotting
method. Both the documents and query strings are encoded using a recently
proposed word representa- tion that projects images and strings into a common
atribute space based on a pyramidal histogram of characters(PHOC). These
attribute models are learned using linear SVMs over the Fisher Vector
representation of the images along with the PHOC labels of the corresponding
strings. In order to search through the whole page, document regions are
indexed per character bi- gram using a similar attribute representation. On top
of that, we propose an integral image representation of the document using a
simplified version of the attribute model for efficient computation. Finally we
introduce a re-ranking step in order to boost retrieval performance. We show
state-of-the-art results for segmentation-free query by string word spotting in
single-writer and multi-writer standard datasetsComment: To be published in ICDAR201
Recognition of compound characters in Kannada language
Recognition of degraded printed compound Kannada characters is a challenging research problem. It has been verified experimentally that noise removal is an essential preprocessing step. Proposed are two methods for degraded Kannada character recognition problem. Method 1 is conventionally used histogram of oriented gradients (HOG) feature extraction for character recognition problem. Extracted features are transformed and reduced using principal component analysis (PCA) and classification performed. Various classifiers are experimented with. Simple compound character classification is satisfactory (more than 98% accuracy) with this method. However, the method does not perform well on other two compound types. Method 2 is deep convolutional neural networks (CNN) model for classification. This outperforms HOG features and classification. The highest classification accuracy is found as 98.8% for simple compound character classification. The performance of deep CNN is far better for other two compound types. Deep CNN turns out to better for pooled character classes
- …