18 research outputs found

    Performance evaluation methodology for historical document image binarization

    No full text
    Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement, and merging. Several experiments conducted in comparison with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme. © 1992-2012 IEEE

    An Objective Evaluation Methodology for Document Image Binarization Techniques

    No full text
    Evaluation of document image binarization techniques is a tedious task that is mainly performed by a human expert or by involving an OCR engine. This paper presents an objective evaluation methodology for document image binarization techniques that aims to reduce the human involvement in the ground truth construction and consecutive testing. A skeletonized ground truth image is produced by the user following a semi-automatic procedure. The estimated ground truth image can aid in evaluating the binarization result in terms of recall and precision as well as to further analyze the result by calculating broken and missing text, deformations and false alarms. A detailed description of the methodology along with a benchmarking of the six (6) most promising state-of-the-art binarization algorithms based on the proposed methodology is presented

    A combined approach for the binarization of handwritten document images

    No full text
    There are many challenges addressed in handwritten document image binarization, such as faint characters, bleed-through and large background ink stains. Usually, binarization methods cannot deal with all the degradation types effectively. Motivated by the low detection rate of faint characters in binarization of handwritten document images, a combination of a global and a local adaptive binarization method at connected component level is proposed that aims in an improved overall performance. Initially, background estimation is applied along with image normalization based on background compensation. Afterwards, global binarization is performed on the normalized image. In the binarized image very small components are discarded and representative characteristics of a document image such as the stroke width and the contrast are computed. Furthermore, local adaptive binarization is performed on the normalized image taking into account the aforementioned characteristics. Finally, the two binarization outputs are combined at connected component level. Our method achieves top performance after extensive testing on the DIBCO (Document Image Binarization Contest) series datasets which include a variety of degraded handwritten document images. © 2012 Elsevier B.V. All rights reserved

    Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering

    No full text
    In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using k-means clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleed-through and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach

    Binarization with the Local Otsu Filter

    No full text
    corecore