82 research outputs found

    Recognition of handwritten Chinese characters by combining regularization, Fisher's discriminant and distorted sample generation

    Get PDF
    Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009, p. 1026–1030The problem of offline handwritten Chinese character recognition has been extensively studied by many researchers and very high recognition rates have been reported. In this paper, we propose to further boost the recognition rate by incorporating a distortion model that artificially generates a huge number of virtual training samples from existing ones. We achieve a record high recognition rate of 99.46% on the ETL-9B database. Traditionally, when the dimension of the feature vector is high and the number of training samples is not sufficient, the remedies are to (i) regularize the class covariance matrices in the discriminant functions, (ii) employ Fisher's dimension reduction technique to reduce the feature dimension, and (iii) generate a huge number of virtual training samples from existing ones. The second contribution of this paper is the investigation of the relative effectiveness of these three methods for boosting the recognition rate. © 2009 IEEE.published_or_final_versio

    Adaptive Algorithms for Automated Processing of Document Images

    Get PDF
    Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts

    PENGENALAN AKSARA JAWA TULISAN TANGAN MENGGUNAKAN DIRECTIONAL ELEMENT FEATURE DAN MULTI CLASS SUPPORT VECTOR MACHINE

    Get PDF
    Javanese character is a set of old traditional letter from Java, Indonesia. It has a complicated structure and it has similiar shape to each other. Optical Character Recognition (OCR) is a field in computer vision that attempted to recognize a certain character within an image. Various kinds of research have been done by using various methods in order to make an OCR system which able to recognize characters properly. Because of Javanese character’s charasteristic, a strong method is needed in order to build a high accurate OCR system in recognizing Javanese character. Directional Element Feature (DEF) is a feature exctraction method that has been used in many researches and has been proven to be strong enough to recognize Chinese characters which has complicated shape structure. DEF builds feature vector by count up image edge neighborhood element in each character. Support Vector Machine (SVM) is a classification method that works by finding a hyperplane with smallest margin to separate two data classes. In some previous research, SVM has been proven to be strong enough to classify data, especially data that has not been seen by the system before. In some other research, SVM has been proven better than common Artificial Neural Network in classifying data. In this research, a Javanese character recognition system is built using DEF and SVM. Test result shows the best recognition accuracy is 93.6% by recognizing 250 handwritten Javanese Character which is 10 letters for each character. Keywords: OCR, handwritten, Javanese character, DEF, SV

    Handwritten signature verification using locally optimized distance-based classification.

    Get PDF
    Thesis (M.Sc.)-University of KwaZulu-Natal, Westville, 2012.Although handwritten signature verification has been extensively researched, it has not achieved optimum accuracy rate. Therefore, efficient and accurate signature verification techniques are required since signatures are still widely used as a means of personal verification. This research work presents efficient distance-based classification techniques as an alternative to supervised learning classification techniques (SLTs). Two different feature extraction techniques were used, namely the Enhanced Modified Direction Feature (EMDF) and the Local Directional Pattern feature (LDP). These were used to analyze the effect of using several different distance-based classification techniques. Among the classification techniques used, are the cosine similarity measure, Mahalanobis, Canberra, Manhattan, Euclidean, weighted Euclidean and fractional distances. Additionally, the novel weighted fractional distances, as well as locally optimized resampling of feature vector sizes were tested. The best accuracy was achieved through applying a combination of the weighted fractional distances and locally optimized resampling classification techniques to the Local Directional Pattern feature extraction. This combination of multiple distance-based classification techniques achieved accuracy rate of 89.2% when using the EMDF feature extraction technique, and 90.8% when using the LDP feature extraction technique. These results are comparable to those in literature, where the same feature extraction techniques were classified with SLTs. The best of the distance-based classification techniques were found to produce greater accuracy than the SLTs

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include
    corecore