12 research outputs found

    Adaptive Binarization of Unconstrained Hand-Held Camera-Captured Document Images

    Get PDF
    Abstract: This paper presents a new adaptive binarization technique for degraded hand-held camera-captured document images. The state-of-the-art locally adaptive binarization methods are sensitive to the values of free parameter. This problem is more critical when binarizing degraded camera-captured document images because of distortions like non-uniform illumination, bad shading, blurring, smearing and low resolution. We demonstrate in this paper that local binarization methods are not only sensitive to the selection of free parameters values (either found manually or automatically), but also sensitive to the constant free parameters values for all pixels of a document image. Some range of values of free parameters are better for foreground regions and some other range of values are better for background regions. For overcoming this problem, we present an adaptation of a state-of-the-art local binarization method such that two different set of free parameters values are used for foreground and background regions respectively. We present the use of ridges detection for rough estimation of foreground regions in a document image. This information is then used to calculate appropriate threshold using different set of free parameters values for the foreground and background regions respectively. The evaluation of the method using an OCR-based measure and a pixel-based measure show that our method achieves better performance as compared to state-of-the-art global and local binarization methods

    A general approach for multi-oriented text line extraction of handwritten document

    Get PDF
    International audienceThe multi-orientation occurs frequently in ancient handwritten documents, where the writers try to update a document by adding some annotations in the margins. Due to the margin narrowness, this gives rise to lines in different directions and orientations. Document recognition needs to find the lines everywhere they are written whatever their orientation. This is why we propose in this paper a new approach allowing us to extract the multi-oriented lines in scanned documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image meshing allowing us to progressively and locally determine the lines. Once the meshing is established, the orientation is determined using the Wigner-Ville distribution on the projection histogram profile. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterward, the text lines are extracted locally in each zone basing on the follow-up of the orientation lines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an accuracy of about 98.6

    Segmentation of ancient Arabic documents

    Get PDF
    International audienceThis chapter addresses the problem of ancient Arabic document segmentation. As ancient documents neither have a real physical structure nor logical one, the segmentation will be limited to textual area or to line extraction in the areas. Although this type of segmentation appears quite simple, its implementation remains a challenging task. This is due to the state of the old document where the image is of low quality, the lines are not straight, sinuous and connected. Given the failure of traditional methods, we proposed a method for line extraction in multi-oriented documents. The method is based on an image meshing that allows it to detect locally and safely the orientations. These orientations are then extended to larger areas. The orientation estimation uses the energy distribution of Cohen's class, more accurate than the projection method. Then, the method exploits the projection peaks to follow the connected components forming text lines. The approach ends with a final separation of connected lines, based on the exploitation of the morphology of terminal letters

    심장 컴퓨터 단층촬영 영상으로부터 경사도 보조 지역 능동 윤곽 모델을 이용한 심장 영역 자동 분할 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 신영길.The heart is one of the most important human organs, and composed of complex structures. Computed tomography angiography (CTA), magnetic resonance imaging (MRI), and single photon emission computed tomography are widely used, non-invasive cardiac imaging modalities. Compared with other modalities, CTA can provide more detailed anatomic information of the heart chambers, vessels, and coronary arteries due to its higher spatial resolution. To obtain important morphological information of the heart, whole heart segmentation is necessary and it can be used for clinical diagnosis. In this paper, we propose a novel framework to segment the four chambers of the heart automatically. First, the whole heart is coarsely extracted. This is separated into the left and right parts using a geometric analysis based on anatomical information and a subsequent power watershed. Then, the proposed gradient-assisted localized active contour model (GLACM) refines the left and right sides of the heart segmentation accurately. Finally, the left and right sides of the heart are separated into atrium and ventricle by minimizing the proposed split energy function that determines the boundary between the atrium and ventricle based on the shape and intensity of the heart. The main challenge of heart segmentation is to extract four chambers from cardiac CTA which has weak edges or separators. To enhance the accuracy of the heart segmentation, we use region-based information and edge-based information for the robustness of the accuracy in heterogeneous region. Model-based method, which requires a number of training data and proper template model, has been widely used for heat segmentation. It is difficult to model those data, since training data should describe precise heart regions and the number of data should be high in order to produce more accurate segmentation results. Besides, the training data are required to be represented with remarkable features, which are generated by manual setting, and these features must have correspondence for each other. However in our proposed methods, the training data and template model is not necessary. Instead, we use edge, intensity and shape information from cardiac CTA for each chamber segmentation. The intensity information of CTA can be substituted for the shape information of the template model. In addition, we devised adaptive radius function and Gaussian-pyramid edge map for GLACM in order to utilize the edge information effectively and improve the accuracy of segmentation comparison with original localizing region-based active contour model (LACM). Since the radius of LACM affects the overall segmentation performance, we proposed an energy function for changing radius adaptively whether homogeneous or heterogeneous region. Also we proposed split energy function in order to segment four chambers of the heart in cardiac CT images and detects the valve of atrium and ventricle. In experimental results using twenty clinical datasets, the proposed method identified the four chambers accurately and efficiently. We also demonstrated that this approach can assist the cardiologist for the clinical investigations and functional analysis.Contents Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Dissertation Goal 7 1.3 Main Contribtions 9 1.4 Organization of the Dissertation 10 Chapter 2 Related Works 11 2.1 Medical Image Segmentation 11 2.1.1 Classic Methods 11 2.1.2 Variational Methods 15 2.1.3 Image Features of the Curve 21 2.1.4 Combinatorial Methods 25 2.1.5 Difficulty of Segmentation 30 2.2 Heart Segmentation 33 2.2.1 Non-Model-Based Segmentation 34 2.2.2 Unstatistical Model-Based Segmentation 35 2.2.3 Statistical Model-Based Segmentation 37 Chapter 3 Gradient-assisted Localized Active Contour Model 41 3.1 LACM 41 3.2 Gaussian-pyramid Edge Map 46 3.3 Adaptive Radius Function 50 3.4 LACM with Gaussian-pyramid Edge Map and Adaptive Radius Function 52 Chapter 4 Segmentation of Four Chambers of Heart 54 4.1 Overview 54 4.2 Segmentation of Whole Heart 56 4.3 Separation of Left and Right Sides of Heart 59 4.3.1 Extraction of Candidate Regions of LV and RV 60 4.3.2 Detection of Left and Right sides of Heart 62 4.4 Segmentation of Left and Right Sides of Heart 66 4.5 Separation of Atrium and Ventricle from Heart 69 4.5.1 Calculation of Principal Axes of Left and Right Sides of Heart 69 4.5.2 Detection of Separation Plane Using Split Energy Function 70 Chapter 5 Experiments 74 5.1 Performance Evaluation 74 5.2 Comparison with Conventional Method 79 5.3 Parametric Study 84 5.4 Computational Performance 85 Chapter 6 Conclusion 86 Bibliography 89Docto

    Document Image Analysis Techniques for Handwritten Text Segmentation, Document Image Rectification and Digital Collation

    Get PDF
    Document image analysis comprises all the algorithms and techniques that are utilized to convert an image of a document to a computer readable description. In this work we focus on three such techniques, namely (1) Handwritten text segmentation (2) Document image rectification and (3) Digital Collation. Offline handwritten text recognition is a very challenging problem. Aside from the large variation of different handwriting styles, neighboring characters within a word are usually connected, and we may need to segment a word into individual characters for accurate character recognition. Many existing methods achieve text segmentation by evaluating the local stroke geometry and imposing constraints on the size of each resulting character, such as the character width, height and aspect ratio. These constraints are well suited for printed texts, but may not hold for handwritten texts. Other methods apply holistic approach by using a set of lexicons to guide and correct the segmentation and recognition. This approach may fail when the domain lexicon is insufficient. In the first part of this work, we present a new global non-holistic method for handwritten text segmentation, which does not make any limiting assumptions on the character size and the number of characters in a word. We conduct experiments on real images of handwritten texts taken from the IAM handwriting database and compare the performance of the presented method against an existing text segmentation algorithm that uses dynamic programming and achieve significant performance improvement. Digitization of document images using OCR based systems is adversely affected if the image of the document contains distortion (warping). Often, costly and precisely calibrated special hardware such as stereo cameras, laser scanners, etc. are used to infer the 3D model of the distorted image which is used to remove the distortion. Recent methods focus on creating a 3D shape model based on 2D distortion informa- tion obtained from the document image. The performance of these methods is highly dependent on estimating an accurate 2D distortion grid. These methods often affix the 2D distortion grid lines to the text line, and as such, may suffer in the presence of unreliable textual cues due to preprocessing steps such as binarization. In the domain of printed document images, the white space between the text lines carries as much information about the 2D distortion as the text lines themselves. Based on this intuitive idea, in the second part of our work we build a 2D distortion grid from white space lines, which can be used to rectify a printed document image by a dewarping algorithm. We compare our presented method against a state-of-the-art 2D distortion grid construction method and obtain better results. We also present qualitative and quantitative evaluations for the presented method. Collation of texts and images is an indispensable but labor-intensive step in the study of print materials. It is an often used methodology by textual scholars when the manuscript of the text does not exist. Although various methods and machines have been designed to assist in this labor, it still remains an expensive and time- consuming process, often requiring travel to distant repositories for the painstaking visual examination of multiple original copies. Efforts to digitize collation have so far depended on first transcribing the texts to be compared, thus introducing into the process more labor and expense, and also more potential error. Digital collation will instead automate the first stages of collation directly from the document images of the original texts, thereby speeding the process of comparison. We describe such a novel framework for digital collation in the third part of this work and provide qualitative results

    Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples

    Get PDF
    International audienceThis work focuses on the layout analysis of historical handwritten registers, in which local religious ceremonies were recorded. The aim of this work is to delimit each record in these registers. To this end, two approaches are proposed. Firstly, object detection networks are explored, as three state-of-the-art architectures are compared. Further experiments are then conducted on Mask R-CNN, as it yields the best performance. Secondly, we introduce and investigate Deep Syntax, a hybrid system that takes advantages of recurrent patterns to delimit each record, by combining ushaped networks and logical rules. Finally, these two approaches are evaluated on 3708 French records (16-18th centuries), as well as on the Esposalles public database, containing 253 Spanish records (17th century). While both systems perform well on homogeneous documents, we observe a significant drop in performance with Mask R-CNN on heterogeneous documents, especially when trained on a non-representative subset. By contrast, Deep Syntax relies on steady patterns, and is therefore able to process a wider range of documents with less training data. Not only Deep Syntax produces 15% more match configurations and reduces the ZoneMap surface error metric by 30% when both systems are trained on 120 images, but it also outperforms Mask R-CNN when trained on a database three times smaller. As Deep Syntax generalizes better, we believe it can be used in the context of massive document processing, as collecting and annotating a sufficiently large and representative set of training data is not always achievable

    Geometric correction of historical Arabic documents

    Get PDF
    Geometric deformations in historical documents significantly influence the success of both Optical Character Recognition (OCR) techniques and human readability. They may have been introduced at any time during the life cycle of a document, from when it was first printed to the time it was digitised by an imaging device. This Thesis focuses on the challenging domain of geometric correction of Arabic historical documents, where background research has highlighted that existing approaches for geometric correction of Latin-script historical documents are not sensitive to the characteristics of text in Arabic documents and therefore cannot be applied successfully. Text line segmentation and baseline detection algorithms have been investigated to propose a new more suitable one for warped Arabic historical document images. Advanced ideas for performing dewarping and geometric restoration on historical Arabic documents, as dictated by the specific characteristics of the problem have been implemented.In addition to developing an algorithm to detect accurate baselines of historical printed Arabic documents the research also contributes a new dataset consisting of historical Arabic documents with different degrees of warping severity.Overall, a new dewarping system, the first for Historical Arabic documents, has been developed taking into account both global and local features of the text image and the patterns of the smooth distortion between text lines. By using the results of the proposed line segmentation and baseline detection methods, it can cope with a variety of distortions, such as page curl, arbitrary warping and fold
    corecore