1,947 research outputs found

    Effective Geometric Restoration of Distorted Historical Document for Large-Scale Digitization

    Get PDF
    Due to storage conditions and materialโ€™s non-planar shape, geometric distortion of the 2-D content is widely present in scanned document images. Effective geometric restoration of these distorted document images considerably increases character recognition rate in large-scale digitisation. For large-scale digitisation of historical books, geometric restoration solutions expect to be accurate, generic, robust, unsupervised and reversible. However, most methods in the literature concentrate on improving restoration accuracy for specific distortion effect, but not their applicability in large-scale digitisation. This paper proposes an effective mesh based geometric restoration system, (GRLSD), for large-scale distorted historical document digitisation. In this system, an automatic mesh generation based dewarping tool is proposed to geometrically model and correct arbitrary warping historical documents. An XML based mesh recorder is proposed to record the mesh of distortion information for reversible use. A graphic user interface toolkit is designed to visually display and manually manipulate the mesh for improving geometric restoration accuracy. Experimental results show that the proposed automatic dewarping approach efficiently corrects arbitrarily warped historical documents, with an improved performance over several state-of-the-art geometric restoration methods. By using XML mesh recorder and GUI toolkit, the GRLSD system greatly aids users to flexibly monitor and correct ambiguous points of mesh for the prevention of damaging historical document images without distortions in large-scale digitalisation

    DocScanner: Robust Document Image Rectification with Progressive Learning

    Full text link
    Compared with flatbed scanners, portable smartphones are much more convenient for physical documents digitizing. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and illumination variations. To this end, we present DocScanner, a novel framework for document image rectification. Different from existing methods, DocScanner addresses this issue by introducing a progressive learning mechanism. Specifically, DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, while the lightweight recurrent architecture ensures the running efficiency. In addition, before the above rectification process, observing the corrupted rectified boundaries existing in prior works, DocScanner exploits a document localization module to explicitly segment the foreground document from the cluttered background environments. To further improve the rectification quality, based on the geometric priori between the distorted and the rectified images, a geometric regularization is introduced during training to further improve the performance. Extensive experiments are conducted on the Doc3D dataset and the DocUNet Benchmark dataset, and the quantitative and qualitative evaluation results verify the effectiveness of DocScanner, which outperforms previous methods on OCR accuracy, image similarity, and our proposed distortion metric by a considerable margin. Furthermore, our DocScanner shows the highest efficiency in runtime latency and model size

    A Book Reader Design for Persons with Visual Impairment and Blindness

    Get PDF
    The objective of this dissertation is to provide a new design approach to a fully automated book reader for individuals with visual impairment and blindness that is portable and cost effective. This approach relies on the geometry of the design setup and provides the mathematical foundation for integrating, in a unique way, a 3-D space surface map from a low-resolution time of flight (ToF) device with a high-resolution image as means to enhance the reading accuracy of warped images due to the page curvature of bound books and other magazines. The merits of this low cost, but effective automated book reader design include: (1) a seamless registration process of the two imaging modalities so that the low resolution (160 x 120 pixels) height map, acquired by an Argos3D-P100 camera, accurately covers the entire book spread as captured by the high resolution image (3072 x 2304 pixels) of a Canon G6 Camera; (2) a mathematical framework for overcoming the difficulties associated with the curvature of open bound books, a process referred to as the dewarping of the book spread images, and (3) image correction performance comparison between uniform and full height map to determine which map provides the highest Optical Character Recognition (OCR) reading accuracy possible. The design concept could also be applied to address the challenging process of book digitization. This method is dependent on the geometry of the book reader setup for acquiring a 3-D map that yields high reading accuracy once appropriately fused with the high-resolution image. The experiments were performed on a dataset consisting of 200 pages with their corresponding computed and co-registered height maps, which are made available to the research community (cate-book3dmaps.fiu.edu). Improvements to the characters reading accuracy, due to the correction steps, were quantified and measured by introducing the corrected images to an OCR engine and tabulating the number of miss-recognized characters. Furthermore, the resilience of the book reader was tested by introducing a rotational misalignment to the book spreads and comparing the OCR accuracy to those obtained with the standard alignment. The standard alignment yielded an average reading accuracy of 95.55% with the uniform height map (i.e., the height values of the central row of the 3-D map are replicated to approximate all other rows), and 96.11% with the full height maps (i.e., each row has its own height values as obtained from the 3D camera). When the rotational misalignments were taken into account, the results obtained produced average accuracies of 90.63% and 94.75% for the same respective height maps, proving added resilience of the full height map method to potential misalignments

    ๋ฌธ์„œ ๊ฒฝ๊ณ„์™€ 3์ฐจ์› ์žฌ๊ตฌ์„ฑ์— ๊ธฐ๋ฐ˜ํ•œ ๋ฌธ์„œ ์ด๋ฏธ์ง€ ํ‰ํŒํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ์ˆ˜๋ฆฌ๊ณผํ•™๋ถ€, 2022. 8. ํ˜„๋™ํ›ˆ.In recent days, most of the scanned images are obtained from mobile devices such as cameras, smartphones, and tablets rather than traditional flatbed scanners. Contrary to the scanning process of the traditional scanners, capturing process of mobile devices might be accompanied by distortions in various forms such as perspective distortion, fold distortion, and page curls. In this thesis, we propose robust dewarping methods which correct such distortions based on the document boundary and 3D reconstruction. In the first method, we construct a curvilinear grid on the document image using the document boundary and reconstruct the document surface in the three dimensional space. Then we rectify the image using a family of local homographies computed from the reconstructed document surface. Although some of the steps of the proposed method have been proposed separately in other research, our approach exploited and combined their advantages to propose a robust dewarping process in addition to improving the stability in the overall process. Moreover, we refined the process by correcting the distorted text region boundary and developed this process into an independent dewarping method which is concise, straight-forward, and robust while still producing a well-rectified document image.์ตœ๊ทผ์—๋Š” ๋Œ€๋ถ€๋ถ„์˜ ์Šค์บ”๋œ ์ด๋ฏธ์ง€๋“ค์ด ์ „ํ†ต์ ์ธ ํ‰ํŒ์Šค์บ๋„ˆ๊ฐ€ ์•„๋‹Œ ์นด๋ฉ”๋ผ, ์Šค๋งˆํŠธํฐ, ํƒœ๋ธ”๋ฆฟ PC ๋“ฑ์˜ ํœด๋Œ€๊ธฐ๊ธฐ๋“ค๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง„๋‹ค. ์ด์ „ ์Šค์บ๋„ˆ๋“ค์˜ ์Šค์บ๋‹ ๊ณผ์ •๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ ํœด๋Œ€๊ธฐ๊ธฐ๋“ค์„ ์ด์šฉํ•œ ์ด๋ฏธ์ง€ ์บก์ณ๋ง ๊ณผ์ •์€ ์›๊ทผ์™œ๊ณก, ์ข…์ด์˜ ์ ‘ํž˜์œผ๋กœ ์ธํ•œ ์™œ๊ณก, ๊ทธ๋ฆฌ๊ณ  ์ข…์ด์˜ ํœ˜์–ด์ง์œผ๋กœ ์ธํ•œ ์™œ๊ณก ๋“ฑ ๋‹ค์–‘ํ•œ ์™œ๊ณก๋“ค์„ ์ˆ˜๋ฐ˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์™œ๊ณก๋“ค์„ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์„œ ๊ฒฝ๊ณ„์™€ 3์ฐจ์› ์žฌ๊ตฌ์„ฑ์— ๊ธฐ๋ฐ˜ํ•œ ๊ฐ•๋ ฅํ•œ ๋””์›Œํ•‘ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์—์„œ๋Š”, ๋ฌธ์„œ ๊ฒฝ๊ณ„๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฌธ์„œ ์ด๋ฏธ์ง€ ์œ„์— ๊ณก์„ ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๊ทธ๋ฆฌ๋“œ๋ฅผ ๋งŒ๋“ค๊ณ , 3์ฐจ์› ๊ณต๊ฐ„ ์ƒ์˜ ๋ฌธ์„œ ๊ณก๋ฉด์„ ์žฌ๊ตฌ์„ฑํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์žฌ๊ตฌ์„ฑ๋œ ๋ฌธ์„œ ๊ณก๋ฉด์œผ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐ๋œ ๊ตญ์†Œ์  ํ˜ธ๋ชจ๊ทธ๋ž˜ํ”ผ๋“ค์„ ์ด์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ˆ˜์ •ํ•œ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ๋ช‡๋ช‡ ๋‹จ๊ณ„๋Š” ๋‹ค๋ฅธ ์—ฐ๊ตฌ์—์„œ ๊ฐœ๋ณ„์ ์œผ๋กœ ์‚ฌ์šฉ๋œ ๊ฒฝ์šฐ๋„ ์žˆ์ง€๋งŒ, ์šฐ๋ฆฌ๋Š” ์ „์ฒด์ ์ธ ๊ณผ์ •์—์„œ ์•ˆ์ •์„ฑ์„ ๋†’์ด๋Š” ๋™์‹œ์— ๊ฐ ๋ฐฉ๋ฒ•์˜ ์žฅ์ ๋“ค์„ ์ด์šฉํ•˜๊ณ  ์กฐํ•ฉํ•˜์—ฌ ๊ฐ•๋ ฅํ•œ ๋””์›Œํ•‘ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด์— ๋”ํ•˜์—ฌ, ์šฐ๋ฆฌ๋Š” ์™œ๊ณก๋œ ํ…์ŠคํŠธ ์˜์—ญ์˜ ๊ฒฝ๊ณ„๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ์ „์ฒด์ ์ธ ๊ณผ์ •์„ ๋ณด์™„ํ•˜์˜€๊ณ , ์ด ์ ˆ์ฐจ๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ณ , ์ง๊ด€์ ์ด๋ฉฐ, ๊ฐ•๋ ฅํ•˜๋ฉด์„œ๋„ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋Š” ๋…๋ฆฝ์ ์ธ ๋””์›Œํ•‘ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค.1. Introduction 1 2. Review on Camera Geometry 6 2.1. Basic Camera Model 6 2.2. 3D Reconstruction Problem 8 3. Related Works 10 3.1. Dewarping Methods based on the Text-lines 10 3.2. Dewarping Methods based on the Document Boundary 11 3.3. Dewarping Methods based on the Grid Construction 12 3.4. Dewarping Methods based on the Document Surface Model in 3D Space 13 4. Document Image Dewarping based on the Document Boundary and 3D Reconstruction 15 4.1. Input Document Image Processing 17 4.1.1. Binarization of the Input Document Image 17 4.1.2. Perspective Distortion Removal using the Document Boundary 19 4.2. Grid Construction on the Document Image 21 4.3. 3D Reconstruction of the Document Surface 23 4.3.1. Geometric Model 23 4.3.2. Normalization of the Grid Corners 24 4.3.3. 3D Reconstruction of the Document Surface 26 4.4. Rectification of the Document Image under a Family of Local Homographies 27 4.5. Global Rectification of the Document Image 29 5. Document Image Dewarping by Straightening Document Boundary Curves 33 6. Conclusion 37 Appendix A. 38 A.1. 4-point Algorithm 38 A.2. Optimization of the Cost Function 40 Bibliography 42 Abstract (in Korean) 47 Acknowledgement (in Korean) 48์„

    ํ…์ŠคํŠธ์™€ ํŠน์ง•์  ๊ธฐ๋ฐ˜์˜ ๋ชฉ์ ํ•จ์ˆ˜ ์ตœ์ ํ™”๋ฅผ ์ด์šฉํ•œ ๋ฌธ์„œ์™€ ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2014. 8. ์กฐ๋‚จ์ต.There are many techniques and applications that detect and recognize text information in the images, e.g., document retrieval using the camera-captured document image, book reader for visually impaired, and augmented reality based on text recognition. In these applications, the planar surfaces which contain the text are often distorted in the captured image due to the perspective view (e.g., road signs), curvature (e.g., unfolded books), and wrinkles (e.g., old documents). Specifically, recovering the original document texture by removing these distortions from the camera-captured document images is called the document rectification. In this dissertation, new text surface rectification algorithms are proposed, for improving text recognition accuracy and visual quality. The proposed methods are categorized into 3 types depending on the types of the input. The contributions of the proposed methods can be summarized as follows. In the first rectification algorithm, the dense text-lines in the documents are employed to rectify the images. Unlike the conventional approaches, the proposed method does not directly use the text-line. Instead, the proposed method use the discrete representation of text-lines and text-blocks which are the sets of connected components. Also, the geometric distortion caused by page curl and perspective view are modeled as generalized cylindrical surfaces and camera rotation respectively. With these distortion model and discrete representation of the features, a cost function whose minimization yields parameters of the distortion model is developed. In the cost function, the properties of the pages such as text-block alignment, line-spacing, and the straightness of text-lines are encoded. By describing the text features using the sets of discrete points, the cost function can be easily defined and well solved by Levenberg-Marquadt algorithm. Experiments show that the proposed method works well for the various layouts and curved surfaces, and compares favorably with the conventional methods on the standard dataset. The second algorithm is a unified framework to rectify and stitch multiple document images using visual feature points instead of text lines. This is similar to the method employed in general image stitching algorithm. However, the general image stitching algorithm usually assumes fixed center of camera, which is not taken for granted in capturing the document. To deal with the camera motion between images, a new parametric family of motion model is proposed in this dissertation. Besides, to remove the ambiguity in the reference plane, a new cost function is developed to impose the constraints on the reference plane. This enables the estimation of physically correct reference plane without prior knowledge. The estimated reference plane can also be used to rectify the stitching result. Furthermore, the proposed method can be applied to any other planar object such as building facades or mural paintings as well as the camera-captured document image since it employs the general features. The third rectification method is based on scene text detection algorithm, which is independent from the language model. The conventional methods assume that a character consists of a single connected component (CC) like English alphabet. However, this assumption is brittle in the Asian characters such as Korean, Chinese, and Japanese, where a single character consists of several CCs. Therefore, it is difficult to divide CCs into text lines without language model. To alleviate this problem, the proposed method clusters the candidate regions based on the similarity measure considering inter-character relation. The adjacency measure is trained on the data set labeled with the bounding box of text region. Non-text regions that remain after clustering are filtered out in text/non-text classification step. Final text regions are merged or divided into each text line considering the orientation and location. The detected text is rectified using the orientation of text-line and vertical strokes. The proposed method outperforms state-of-the-art algorithms in English as well as Asian characters in the extensive experiments.1 Introduction 1 1.1 Document rectification via text-line based optimization . . . . . . . 2 1.2 A unified approach of rectification and stitching for document images 4 1.3 Rectification via scene text detection . . . . . . . . . . . . . . . . . . 5 1.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Related work 9 2.1 Document rectification . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Document dewarping without text-lines . . . . . . . . . . . . 9 2.1.2 Document dewarping with text-lines . . . . . . . . . . . . . . 10 2.1.3 Text-block identification and text-line extraction . . . . . . . 11 2.2 Document stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Scene text detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Document rectification based on text-lines 15 3.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Image acquisition model . . . . . . . . . . . . . . . . . . . . . 16 3.1.2 Proposed approach to document dewarping . . . . . . . . . . 18 3.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 22 3.2.1 Design of Estr(ยท) . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Minimization of Estr(ยท) . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Alignment type classification . . . . . . . . . . . . . . . . . . 28 3.2.4 Design of Ealign(ยท) . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.5 Design of Espacing(ยท) . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Extension to unfolded book surfaces . . . . . . . . . . . . . . . . . . 32 3.4 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.1 Experiments on synthetic data . . . . . . . . . . . . . . . . . 36 3.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 39 3.4.3 Comparison with existing methods . . . . . . . . . . . . . . . 43 3.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Document rectification based on feature detection 49 4.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 51 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Homography between the i-th image and E . . . . . . . . . 52 4.2.3 Proposed cost function . . . . . . . . . . . . . . . . . . . . . . 53 4.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.5 Relation to the model in [17] . . . . . . . . . . . . . . . . . . 55 4.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 Classification of two cases . . . . . . . . . . . . . . . . . . . . 56 4.3.2 Skew removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.1 Quantitative evaluation on metric reconstruction performance 57 4.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 58 5 Scene text detection and rectification 67 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.2 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Candidate region detection . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.1 CC extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.2 Computation of similarity between CCs . . . . . . . . . . . . 70 5.2.3 CC clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3 Rectification of candidate region . . . . . . . . . . . . . . . . . . . . 73 5.4 Text/non-text classification . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.1 Experimental results on ICDAR 2011 dataset . . . . . . . . . 80 5.5.2 Experimental results on the Asian character dataset . . . . . 80 6 Conclusion 83 Bibliography 87 Abstract (Korean) 97Docto

    Document image restoration - For document images scanned from bound volumes -

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    • โ€ฆ
    corecore