561,003 research outputs found

    Document Image Analysis Techniques for Handwritten Text Segmentation, Document Image Rectification and Digital Collation

    Get PDF
    Document image analysis comprises all the algorithms and techniques that are utilized to convert an image of a document to a computer readable description. In this work we focus on three such techniques, namely (1) Handwritten text segmentation (2) Document image rectification and (3) Digital Collation. Offline handwritten text recognition is a very challenging problem. Aside from the large variation of different handwriting styles, neighboring characters within a word are usually connected, and we may need to segment a word into individual characters for accurate character recognition. Many existing methods achieve text segmentation by evaluating the local stroke geometry and imposing constraints on the size of each resulting character, such as the character width, height and aspect ratio. These constraints are well suited for printed texts, but may not hold for handwritten texts. Other methods apply holistic approach by using a set of lexicons to guide and correct the segmentation and recognition. This approach may fail when the domain lexicon is insufficient. In the first part of this work, we present a new global non-holistic method for handwritten text segmentation, which does not make any limiting assumptions on the character size and the number of characters in a word. We conduct experiments on real images of handwritten texts taken from the IAM handwriting database and compare the performance of the presented method against an existing text segmentation algorithm that uses dynamic programming and achieve significant performance improvement. Digitization of document images using OCR based systems is adversely affected if the image of the document contains distortion (warping). Often, costly and precisely calibrated special hardware such as stereo cameras, laser scanners, etc. are used to infer the 3D model of the distorted image which is used to remove the distortion. Recent methods focus on creating a 3D shape model based on 2D distortion informa- tion obtained from the document image. The performance of these methods is highly dependent on estimating an accurate 2D distortion grid. These methods often affix the 2D distortion grid lines to the text line, and as such, may suffer in the presence of unreliable textual cues due to preprocessing steps such as binarization. In the domain of printed document images, the white space between the text lines carries as much information about the 2D distortion as the text lines themselves. Based on this intuitive idea, in the second part of our work we build a 2D distortion grid from white space lines, which can be used to rectify a printed document image by a dewarping algorithm. We compare our presented method against a state-of-the-art 2D distortion grid construction method and obtain better results. We also present qualitative and quantitative evaluations for the presented method. Collation of texts and images is an indispensable but labor-intensive step in the study of print materials. It is an often used methodology by textual scholars when the manuscript of the text does not exist. Although various methods and machines have been designed to assist in this labor, it still remains an expensive and time- consuming process, often requiring travel to distant repositories for the painstaking visual examination of multiple original copies. Efforts to digitize collation have so far depended on first transcribing the texts to be compared, thus introducing into the process more labor and expense, and also more potential error. Digital collation will instead automate the first stages of collation directly from the document images of the original texts, thereby speeding the process of comparison. We describe such a novel framework for digital collation in the third part of this work and provide qualitative results

    Document analysis using image processing techniques.

    Get PDF
    Image thresholding and page segmentation are necessary components of any image understanding and recognition system. In order for an OCR to function properly, texts in a document image has to be isolated and then fed to the OCR for recognition. This requires development of a robust and accurate page segmentation technique. In any page segmentation technique, a preprocessing step in terms of image restoration and thresholding is needed. This thesis therefore concentrates on the development of efficient and robust image thresholding and page segmentation algorithms. In this thesis, three efficient contrast enhancement techniques are proposed that in conjunction with the thresholding techniques of Ridler and Calvard constitute the preprocessing step for the image segmentation algorithm. This thesis also provides a survey of the pertinent page segmentation techniques in the literature, and proposes a new block labeling technique based on smearing algorithm. An exhaustive experimentation is conducted in this thesis to demonstrate the efficiency of the proposed techniques.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .L56. Source: Masters Abstracts International, Volume: 42-03, page: 1015. Thesis (M.A.Sc.)--University of Windsor (Canada), 2003

    LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

    Full text link
    Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at https://aka.ms/layoutlmv3.Comment: Work in Progres

    Page layout analysis and classification in complex scanned documents

    Get PDF
    Page layout analysis has been extensively studied since the 1980`s, particularly after computers began to be used for document storage or database units. For efficient document storage and retrieval from a database, a paper document would be transformed into its electronic version. Algorithms and methodologies are used for document image analysis in order to segment a scanned document into different regions such as text, image or line regions. To contribute a novel approach in the field of page layout analysis and classification, this algorithm is developed for both RGB space and grey-scale scanned documents without requiring any specific document types, and scanning techniques. In this thesis, a page classification algorithm is proposed which mainly applies wavelet transform, Markov random field (MRF) and Hough transform to segment text, photo and strong edge/ line regions in both color and gray-scale scanned documents. The algorithm is developed to handle both simple and complex page layout structures and contents (text only vs. book cover that includes text, lines and/or photos). The methodology consists of five modules. In the first module, called pre-processing, image enhancements techniques such as image scaling, filtering, color space conversion or gamma correction are applied in order to reduce computation time and enhance the scanned document. The techniques, used to perform the classification, are employed on the one-fourth resolution input image in the CIEL*a*b* color space. In the second module, the text detection module uses wavelet analysis to generate a text-region candidate map which is enhanced by applying a Run Length Encoding (RLE) technique for verification purposes. The third module, photo detection, initially uses block-wise segmentation which is based on basis vector projection technique. Then, MRF with maximum a-posteriori (MAP) optimization framework is utilized to generate photo map. Next, Hough transform is applied to locate lines in the fourth module. Techniques for edge detection, edge linkages, and line-segment fitting are used to detect strong-edges in the module as well. After those three classification maps are obtained, in the last module a final page layout map is generated by using K-Means. Features are extracted to classify the intersection regions and merge into one classification map with K-Means clustering. The proposed technique is tested on several hundred images and its performance is validated by utilizing Confusion Matrix (CM). It shows that the technique achieves an average of 85% classification accuracy rate in text, photo, and background regions on a variety of scanned documents like articles, magazines, business-cards, dictionaries or newsletters etc. More importantly, it performs independently from a scanning process and an input scanned document (RGB or gray-scale) with comparable classification quality

    Document Flash Thermography

    Get PDF
    This paper presents an extension of flash thermography techniques to the analysis of documents. Motivation for this research is to develop the ability to reveal covered writings in archaeological artifacts such as the Codex Selden or Egyptian Cartonnage. An emphasis is placed on evaluating several common existing signal processing techniques for their effectiveness in enhancing subsurface writings found within a set of test documents. These processing techniques include: contrast stretching, histogram equalization, image filters, contrast images, differential absolute contrast (DAC), thermal signal reconstruction (TSR), principal component thermography (PCT), dynamic thermal tomography (DTT), pulse phase thermography (PPT), and fitting-correlation analysis (FCA). The ability of flash thermography and the combined techniques to reveal subsurface writings and document strikeouts will be evaluated. In addition, the differences in flash thermography parameters are evaluated for most effective imaging of the two document subsets

    Efficient MRF approach to document image enhancement

    Get PDF
    Markov random field (MRF) based approaches have been shown to perform well in a wide range of applications. Due to the iterative nature of the algorithm, the computational cost of such applications is normally high. In the context of document image analysis, where numerous documents have to be processed, this computational cost may become prohibitive. We describe a novel approach to document image enhancement using MRF. We show that by using domain specific knowledge, we are able to substantially improve computational performance by an order of magnitude. Moreover, in contrast to known techniques where patch initialization is arbitrary, in the proposed approach patch initialization is data consistent and so results in improved effectiveness. Experimental results comparing the proposed approach to known techniques using historical documents from the Frieder Collection are provided. © 2008 IEEE

    Bank Form Classification using Document Layout Analysis and Image Processing Techniques

    Get PDF
    Every day thousands of forms are filled out and submitted across the world, in banks, post offices, government organizations, educational institutions etc. These include electronic forms as well as physical forms. All of these forms irrespective of their origin are at some stage made digital and stored electronically to address issues of physical storage, form degradation and data accessibility. Document layout analysis is a basic step in converting document images into electronic form. This conversion is laborious and can be made more efficient (in terms of throughput and human resource) by automating most of the conversion process using document layout analysis techniques. Document classification is an important step in Office Automation, Digital Libraries, and other document image analysis applications. Physical forms require human supervision for any operations done on the form. Digitization of these forms reduces human resources, also reduces any human redundancy involved with the operation on the physical forms. This paper addresses the initial stage of this automation, namely, bank form classification and decipherment of fields. The former recognizes the type of the bank form and the latter extracts regions of useful data from the classified bank form. The proposed work aims to provide accurate bank form classification along with noise removal, skew detection and correction, finally layout analysis is carried out to extract fields like name, address, signature from the classified forms

    Techniques for image classification, object detection and object segmentation

    Get PDF
    In this paper we document the techniques which we used to participate in the PASCAL NoE VOC Challenge 2007 image analysis performance evaluation campaign. We took part in three of the image analysis competitions: image classification, object detection and object segmentation. In the classification task of the evaluation our method produced comparatively good performance, the 4th best of 19 submissions. In contrast, our detection results were quite modest. Our method's segmentation accuracy was the best of all submissions. Our approach for the classification task is based on fused classifications by numerous global image features, including histograms of local features. The object detection combines similar classification of automatically extracted image segments and the previously obtained scene type classifications. The object segmentations are obtained in a straightforward fashion from the detection results
    • …
    corecore