1,246 research outputs found

    Kurdish Optical Character Recognition

    Get PDF
    Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average

    Recognizing Degraded Handwritten Characters

    Get PDF
    In this paper, Slavonic manuscripts from the 11th century written in Glagolitic script are investigated. State-of-the-art optical character recognition methods produce poor results for degraded handwritten document images. This is largely due to a lack of suitable results from basic pre-processing steps such as binarization and image segmentation. Therefore, a new, binarization-free approach will be presented that is independent of pre-processing deficiencies. It additionally incorporates local information in order to recognize also fragmented or faded characters. The proposed algorithm consists of two steps: character classification and character localization. Firstly scale invariant feature transform features are extracted and classified using support vector machines. On this basis interest points are clustered according to their spatial information. Then, characters are localized and eventually recognized by a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background noise, e.g. stains, tears, and faded characters
    corecore