17 research outputs found

    Adaptive Binarization of Unconstrained Hand-Held Camera-Captured Document Images

    Get PDF
    Abstract: This paper presents a new adaptive binarization technique for degraded hand-held camera-captured document images. The state-of-the-art locally adaptive binarization methods are sensitive to the values of free parameter. This problem is more critical when binarizing degraded camera-captured document images because of distortions like non-uniform illumination, bad shading, blurring, smearing and low resolution. We demonstrate in this paper that local binarization methods are not only sensitive to the selection of free parameters values (either found manually or automatically), but also sensitive to the constant free parameters values for all pixels of a document image. Some range of values of free parameters are better for foreground regions and some other range of values are better for background regions. For overcoming this problem, we present an adaptation of a state-of-the-art local binarization method such that two different set of free parameters values are used for foreground and background regions respectively. We present the use of ridges detection for rough estimation of foreground regions in a document image. This information is then used to calculate appropriate threshold using different set of free parameters values for the foreground and background regions respectively. The evaluation of the method using an OCR-based measure and a pixel-based measure show that our method achieves better performance as compared to state-of-the-art global and local binarization methods

    Generic Methods for Document Layout Analysis and Preprocessing

    No full text
    Generic layout analysis--process of decomposing document image into homogeneous regions for a collection of diverse document images--has many important applications in document image analysis and understanding such as preprocessing of degraded warped, camera-captured document images, high performance layout analysis of document images containing complex cursive scripts, and word spotting in historical document images at page level. Many areas in this field like generic text line extraction method are considered as elusive goals so far, still beyond the reach of the state-of-the-art methods [NJ07, LSZT07, KB06]. This thesis addresses this problem in such a way that it presents generic, domain-independent, text line extraction and text and non-text segmentation methods, and then describes some important applications, that were developed based on these methods. An overview of the key contributions of this thesis is as follows. The first part of this thesis presents a generic text line extraction method using a combination of matched filtering and ridge detection techniques, which are commonly used in computer vision. Unlike the state-of-the-art text line extraction methods in the literature, the generic text line extraction method can be equally and robustly applied to a large variety of document image classes including scanned and camera-captured documents, binary and grayscale documents, typed-text and handwritten documents, historical and contemporary documents, and documents containing different scripts. Different standard datasets are selected for performance evaluation that belong to different categories of document images such as the UW-III [GHHP97] dataset of scanned documents, the ICDAR 2007 [GAS07] and the UMD [LZDJ08] datasets of handwritten documents, the DFKI-I [SB07] dataset of camera-captured documents, Arabic/Urdu script documents dataset, and German calligraphic (Fraktur) script historical documents dataset. The generic text line extraction method achieves 86% (n = 23,763 text lines in 650 documents) text line detection accuracy which is better than the aggregate accuracy of 73% of the best performing domain-specific state-of-the-art methods. To the best of the author's knowledge, it is the first general-purpose text line extraction method that can be equally used for a diverse collection of documents. This thesis also presents an active contour (snake) based curled text line extraction method for warped, camera-captured document images. The presented approach is applied to DFKI-I [SB07] dataset of camera-captured, Latin script document images for curled text line extraction. It achieves above 95% (n = 3,091 text lines in 102 documents) text line detection accuracy, which is significantly better than the competing state-of-the-art curled text line extraction methods. The presented text line extraction method can also be applied to document images containing different scripts like Chinese, Devanagari, and Arabic after small modifications. The second part of this thesis presents an improved version of the state-of-the-art multiresolution morphology (Leptonica) based text and non-text segmentation method [Blo91], which is a domain-independent page segmentation approach and can be equally applied to a diverse collection of binarized document images. It is demonstrated that the presented improvements result in an increase in segmentation accuracy from 93% to 99% (n = 113 documents). This thesis also introduces a discriminative learning based approach for page segmentation, where a self-tunable multi-layer perceptron (MLP) classifier [BS10] is trained for distinguishing between text and non-text connected components. Unlike other classification based page segmentation approaches in the literature, the connected components based discriminative learning based approach is faster than pixel based classification methods and does not require a block segmentation method beforehand. A segmentation accuracy of 96%96\% (n=113n = 113 documents) is achieved in comparison to the state-of-the-art multiresolution morphology (Leptonica) based page segmentation method [Blo91] that achieves a segmentation accuracy of 93%. In addition to text and non-text segmentation of Latin script documents, the presented approach can also be adapted for document images containing other scripts as well as for other specialized layout analysis tasks such as digit and non-digit segmentation [HBSB12], orientation detection [RBSB09], and body-text and side-note segmentation [BAESB12]. Finally, this thesis presents important applications of the two generic layout analysis techniques, ridge-based text line extraction method and the multi-resolution morphology based text and non-text segmentation method, discussed above. First, a complete preprocessing pipeline is described for removing different types of degradations from grayscale warped, camera-captured document images that includes removal of grayscale degradations such as non-uniform shadows and blurring through binarization, noise cleanup applying page frame detection, and document rectification using monocular dewarping. Each of these preprocessing steps shows significant improvement in comparison to the analyzed state-of-the-art methods in the literature. Second, a high performance layout analysis method is described for complex Arabic script document images written in different languages such as Arabic, Urdu, and Persian and different styles for example Naskh and Nastaliq. The presented layout analysis system is robust against different types of document image degradations and shows better performance for text and non-text segmentation, text line extraction, and reading order determination on a variety of Arabic and Urdu document images as compared to the state-of-the-art methods. It can be used for large scale Arabic and Urdu documents' digitization processes. These applications demonstrate that the layout analysis methods, ridge-based text line extraction and the multi-resolution morphology based text and non-text segmentation, are generic and can be applied easily to a large collection of diverse document images

    Generic Methods for Document Layout Analysis and Preprocessing

    Get PDF
    Generic layout analysis--process of decomposing document image into homogeneous regions for a collection of diverse document images--has many important applications in document image analysis and understanding such as preprocessing of degraded warped, camera-captured document images, high performance layout analysis of document images containing complex cursive scripts, and word spotting in historical document images at page level. Many areas in this field like generic text line extraction method are considered as elusive goals so far, still beyond the reach of the state-of-the-art methods [NJ07, LSZT07, KB06]. This thesis addresses this problem in such a way that it presents generic, domain-independent, text line extraction and text and non-text segmentation methods, and then describes some important applications, that were developed based on these methods. An overview of the key contributions of this thesis is as follows. The first part of this thesis presents a generic text line extraction method using a combination of matched filtering and ridge detection techniques, which are commonly used in computer vision. Unlike the state-of-the-art text line extraction methods in the literature, the generic text line extraction method can be equally and robustly applied to a large variety of document image classes including scanned and camera-captured documents, binary and grayscale documents, typed-text and handwritten documents, historical and contemporary documents, and documents containing different scripts. Different standard datasets are selected for performance evaluation that belong to different categories of document images such as the UW-III [GHHP97] dataset of scanned documents, the ICDAR 2007 [GAS07] and the UMD [LZDJ08] datasets of handwritten documents, the DFKI-I [SB07] dataset of camera-captured documents, Arabic/Urdu script documents dataset, and German calligraphic (Fraktur) script historical documents dataset. The generic text line extraction method achieves 86% (n = 23,763 text lines in 650 documents) text line detection accuracy which is better than the aggregate accuracy of 73% of the best performing domain-specific state-of-the-art methods. To the best of the author's knowledge, it is the first general-purpose text line extraction method that can be equally used for a diverse collection of documents. This thesis also presents an active contour (snake) based curled text line extraction method for warped, camera-captured document images. The presented approach is applied to DFKI-I [SB07] dataset of camera-captured, Latin script document images for curled text line extraction. It achieves above 95% (n = 3,091 text lines in 102 documents) text line detection accuracy, which is significantly better than the competing state-of-the-art curled text line extraction methods. The presented text line extraction method can also be applied to document images containing different scripts like Chinese, Devanagari, and Arabic after small modifications. The second part of this thesis presents an improved version of the state-of-the-art multiresolution morphology (Leptonica) based text and non-text segmentation method [Blo91], which is a domain-independent page segmentation approach and can be equally applied to a diverse collection of binarized document images. It is demonstrated that the presented improvements result in an increase in segmentation accuracy from 93% to 99% (n = 113 documents). This thesis also introduces a discriminative learning based approach for page segmentation, where a self-tunable multi-layer perceptron (MLP) classifier [BS10] is trained for distinguishing between text and non-text connected components. Unlike other classification based page segmentation approaches in the literature, the connected components based discriminative learning based approach is faster than pixel based classification methods and does not require a block segmentation method beforehand. A segmentation accuracy of 96%96\% (n=113n = 113 documents) is achieved in comparison to the state-of-the-art multiresolution morphology (Leptonica) based page segmentation method [Blo91] that achieves a segmentation accuracy of 93%. In addition to text and non-text segmentation of Latin script documents, the presented approach can also be adapted for document images containing other scripts as well as for other specialized layout analysis tasks such as digit and non-digit segmentation [HBSB12], orientation detection [RBSB09], and body-text and side-note segmentation [BAESB12]. Finally, this thesis presents important applications of the two generic layout analysis techniques, ridge-based text line extraction method and the multi-resolution morphology based text and non-text segmentation method, discussed above. First, a complete preprocessing pipeline is described for removing different types of degradations from grayscale warped, camera-captured document images that includes removal of grayscale degradations such as non-uniform shadows and blurring through binarization, noise cleanup applying page frame detection, and document rectification using monocular dewarping. Each of these preprocessing steps shows significant improvement in comparison to the analyzed state-of-the-art methods in the literature. Second, a high performance layout analysis method is described for complex Arabic script document images written in different languages such as Arabic, Urdu, and Persian and different styles for example Naskh and Nastaliq. The presented layout analysis system is robust against different types of document image degradations and shows better performance for text and non-text segmentation, text line extraction, and reading order determination on a variety of Arabic and Urdu document images as compared to the state-of-the-art methods. It can be used for large scale Arabic and Urdu documents' digitization processes. These applications demonstrate that the layout analysis methods, ridge-based text line extraction and the multi-resolution morphology based text and non-text segmentation, are generic and can be applied easily to a large collection of diverse document images

    T.M.: Script-independent handwritten textlines segmentation using active contours

    No full text
    Handwritten document images contain textlines with multi orientations, touching and overlapping characters within consecutive textlines, and small inter-line spacing making textline segmentation a difficult task. In this paper we propose a novel, script-independent textline segmentation approach for handwritten documents, which is robust against above mentioned problems. We model textline extraction as a general image segmentation task. We compute the central line of parts of textlines using ridges over the smoothed image. Then we adapt the state-of-the-art active contours (snakes) over ridges, which results in textline segmentation. Unlike the “Level Set ” and “Mumford-Shah model ” based handwritten textline segmentation methods, our method use matched filter bank approach for smoothing and does not require heuristic postprocessing steps for merging or splitting segmented textlines. Experimental results prove the effectiveness of the proposed algorithm. We evaluated our algorithm on ICDAR 2007 handwritten segmentation contest dataset and obtained an accuracy of 96.3%.

    Ridges Based Curled Textline Region Detection from Grayscale Camera-Captured Document Images

    No full text
    Abstract. As compared to scanners, cameras offer fast, flexible and non-contact document imaging, but with distortions like uneven shading and warped shape. Therefore, camera-captured document images need preprocessing steps like binarization and textline detection for dewarping so that traditional document image processing steps can be applied on them. Previous approaches of binarization and curled textline detection are sensitive to distortions and loose some crucial image information during each step, which badly affects dewarping and further processing. Here we introduce a novel algorithm for curled textline region detection directly from a grayscale camera-captured document image, in which matched filter bank approach is used for enhancing textline structure and then ridges detection is applied for finding central line of curled textlines. The resulting ridges can be potentially used for binarization, dewarping or designing new techniques for camera-captured document image processing. Our approach is robust against bad shading and high degrees of curl. We have achieved around 91 % detection accuracy on the dataset of CBDAR 2007 document image dewarping contest

    Segmentation of Curled Textlines Using Active Contours

    No full text
    Segmentation of curled textlines from warped document images is one of the major issues in document image de-warping. Most of the curled textlines segmentation algo-rithms present in the literature today are sensitive to the degree of curl, direction of curl, and spacing between adja-cent lines. We present a new algorithm for curled textline segmentation which is robust to above mentioned problems at the expense of high execution time. We will demon-strate this insensitivity in a performance evaluation section. Our approach is based on the state-of-the-art image seg-mentation technique: Active Contour Model (Snake) with the novel idea of several baby snakes and their conver-gence in a vertical direction only. Experiment on publically available CBDAR 2007 document image dewarping contest dataset shows our textline segmentation algorithm accuracy of 97.96%.

    Foregroundbackground regions guided binarization of cameracaptured document images

    No full text
    Binarization is an important preprocessing step in several document image processing tasks. Nowadays handheld camera devices are in widespread use, that allow fast and flexible document image capturing. But, they may produce degraded grayscale image, especially due to bad shading or non-uniform illumination. State-of-the-art binarization techniques, which are designed for scanned images, do not perform well on camera-captured documents. Furthermore, local adaptive binarization methods, like Niblack [1], Sauvola [2], etc, are sensitive to free parameter values, which are fixed for whole image. In this paper, we describe a novel binarization technique using ridges-guided local binarization method, in which appropriate free parameter value(s) is(are) selected for each pixel depending on the presence or absence of ridge(s) in the local neighborhood of a pixel. Our method gives a novel way of automatically selecting parameter values for local binarization method, this improves binarization results for both scanned and camera-captured document images relative to previous methods. Experimental results on a subset of CBDAR 2007 document image dewarping contest dataset show a decrease in OCR error rate using reported method with respect to other stat-of-the-art bianrization methods.

    T.M.: Dewarping of document images using coupled-snakes

    No full text
    Traditional OCR systems are designed for planar (dewarped) images and the accuracy is reduced when applied on warped images. Therefore, developing new OCR techniques for warped images or developing dewarping techniques are the possible solutions for improving OCR accuracy camera-captured documents. Among different types of dewarping techniques, curled textlines information based dewarping techniques are the most popular ones, but are sensitive to high degrees of curl and variable line spacing. In this paper we build a novel dewarping approach based on curled textlines information, which has been extracted using ridges based modified active contour model (coupledsnakes). Our dewarping approach is less sensitive different direction of curl and variable line spacing. Experimental results show that OCR error rate, from warped to dewarped documents, has been reduced from 5.15 % to 1.92 % on the dataset of CBDAR 2007 document image dewarping contest. We also report the performance of our method in comparison with other state-of-the-art methods.
    corecore