5 research outputs found

    Optimization of the Gaussian Kernel Extended by Binary Morphology for Text Line Segmentation

    Get PDF
    In this paper, an approach for text line segmentation by algorithm with the implementation of the Gaussian kernel is presented. As a result of algorithm, the growing area around text is exploited for text line segmentation. To improve text line segmentation process, isotropic Gaussian kernel is extended by dilatation. Furthermore, algorithms with isotropic and extended Gaussian kernels are examined and evaluated under different text samples. Results are given and comparative analysis is made for these algorithms. From the obtained results, optimization of the parameters defining extended Gaussian kernel dimension is proposed. The presented algorithm with the extended Gaussian kernel showed robustness for different types of text samples

    How to separate between Machine-Printed/Handwritten and Arabic/Latin Words?

    Get PDF
    This paper gathers some contributions to script and its nature identification. Different sets of features have been employed successfully for discriminating between handwritten and machine-printed Arabic and Latin scripts. They include some well established features, previously used in the literature, and new structural features which are intrinsic to Arabic and Latin scripts. The performance of such features is studied towards this paper. We also compared the performance of five classifiers: Bayes (AODEsr), k-Nearest Neighbor (k-NN), Decision Tree (J48), Support Vector Machine (SVM) and Multilayer perceptron (MLP) used to identify the script at word level. These classifiers have been chosen enough different to test the feature contributions. Experiments have been conducted with handwritten and machine-printed words, covering a wide range of fonts. Experimental results show the capability of the proposed features to capture differences between scripts and the effectiveness of the three classifiers. An average identification precision and recall rates of 98.72% was achieved, using a set of 58 features and AODEsr classifier, which is slightly better than those reported in similar works

    Neighborhood Label Extension for Handwritten/Printed Text Separation in Arabic Documents

    Get PDF
    International audienceThis paper addresses the problem of handwritten and printed text separation in Arabic document images. The objective is to extract handwritten text from other parts of the document. This allows the application, in a second time, of a specialized processing on the extracted handwritten part or even on the printed one. Documents are first preprocessed in order to remove eventual noise and correct document orientation. Then, the document is segmented into pseudo-lines that are segmented in turn into pseudo-words. A local classification step, using a Gaussian kernel SVM, associates each pseudo-word into handwritten or printed classes. This label is then propagated in the pseudo-word's neighborhood in order to recover from classification errors. The proposed methodology has been tested on a set of public real Arabic documents achieving a separation rate of around 90%

    Arabic/Latin and Machine-printed/Handwritten Word Discrimination using HOG-based Shape Descriptor

    Get PDF
    In this paper, we present an approach for Arabic and Latin script and its type identification based onHistogram of Oriented Gradients (HOG) descriptors. HOGs are first applied at word level based on writingorientation analysis. Then, they are extended to word image partitions to capture fine and discriminativedetails. Pyramid HOG are also used to study their effects on different observation levels of the image.Finally, co-occurrence matrices of HOG are performed to consider spatial information between pairs ofpixels which is not taken into account in basic HOG. A genetic algorithm is applied to select the potentialinformative features combinations which maximizes the classification accuracy. The output is a relativelyshort descriptor that provides an effective input to a Bayes-based classifier. Experimental results on a set ofwords, extracted from standard databases, show that our identification system is robust and provides goodword script and type identification: 99.07% of words are correctly classified

    Arabic/Latin and Machine-printed/Handwritten Word Discrimination using HOG-based Shape Descriptor

    Get PDF
    In this paper, we present an approach for Arabic and Latin script and its type identification based onHistogram of Oriented Gradients (HOG) descriptors. HOGs are first applied at word level based on writingorientation analysis. Then, they are extended to word image partitions to capture fine and discriminativedetails. Pyramid HOG are also used to study their effects on different observation levels of the image.Finally, co-occurrence matrices of HOG are performed to consider spatial information between pairs ofpixels which is not taken into account in basic HOG. A genetic algorithm is applied to select the potentialinformative features combinations which maximizes the classification accuracy. The output is a relativelyshort descriptor that provides an effective input to a Bayes-based classifier. Experimental results on a set ofwords, extracted from standard databases, show that our identification system is robust and provides goodword script and type identification: 99.07% of words are correctly classified
    corecore