3,845 research outputs found

    DocScanner: Robust Document Image Rectification with Progressive Learning

    Full text link
    Compared with flatbed scanners, portable smartphones are much more convenient for physical documents digitizing. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and illumination variations. To this end, we present DocScanner, a novel framework for document image rectification. Different from existing methods, DocScanner addresses this issue by introducing a progressive learning mechanism. Specifically, DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, while the lightweight recurrent architecture ensures the running efficiency. In addition, before the above rectification process, observing the corrupted rectified boundaries existing in prior works, DocScanner exploits a document localization module to explicitly segment the foreground document from the cluttered background environments. To further improve the rectification quality, based on the geometric priori between the distorted and the rectified images, a geometric regularization is introduced during training to further improve the performance. Extensive experiments are conducted on the Doc3D dataset and the DocUNet Benchmark dataset, and the quantitative and qualitative evaluation results verify the effectiveness of DocScanner, which outperforms previous methods on OCR accuracy, image similarity, and our proposed distortion metric by a considerable margin. Furthermore, our DocScanner shows the highest efficiency in runtime latency and model size

    Captured open book image de-warping and shading correction using 3D depth information

    Get PDF
    Various three dimensional (3D) measuring or capturing devices are introduced to the society recently, and there are abundant possibilities that we can take advantage of this new technology. In this research, we worked on one useful application: to correct the distortion due to the curved shape of the pages of an open book in captured images using of depth information. This work is relevant to camera-based capture devices that can use a projector to cast structured light patterns to provide depth information. In order to improve the visual quality of captured documents, we established our algorithm from two perspectives. First, we deal with the shading situation in the captured image as a result of the non-uniform lighting condition. The shading correction is based on the shading information of the margin of the document, or based on the estimated relative position of each piece of the scanned open book to the active illumination. The open book will look like it is captured under a uniform lighting source in the corrected images. Next, we handle the geometric distortion. The 3D shape reconstruction methods and geometric rectification are used to flatten the curvature of an open book. The models we used exploit specific prior assumptions about the nature of the printed material that is captured. The warped text line can be straightened after this rectification. The overall readability improvement in captured open book images obtained by using our method can be observed in the experimental results

    Innovative Techniques for Digitizing and Restoring Deteriorated Historical Documents

    Get PDF
    Recent large-scale document digitization initiatives have created new modes of access to modern library collections with the development of new hardware and software technologies. Most commonly, these digitization projects focus on accurately scanning bound texts, some reaching an efficiency of more than one million volumes per year. While vast digital collections are changing the way users access texts, current scanning paradigms can not handle many non-standard materials. Documentation forms such as manuscripts, scrolls, codices, deteriorated film, epigraphy, and rock art all hold a wealth of human knowledge in physical forms not accessible by standard book scanning technologies. This great omission motivates the development of new technology, presented by this thesis, that is not-only effective with deteriorated bound works, damaged manuscripts, and disintegrating photonegatives but also easily utilized by non-technical staff. First, a novel point light source calibration technique is presented that can be performed by library staff. Then, a photometric correction technique which uses known illumination and surface properties to remove shading distortions in deteriorated document images can be automatically applied. To complete the restoration process, a geometric correction is applied. Also unique to this work is the development of an image-based uncalibrated document scanner that utilizes the transmissivity of document substrates. This scanner extracts intrinsic document color information from one or both sides of a document. Simultaneously, the document shape is estimated to obtain distortion information. Lastly, this thesis provides a restoration framework for damaged photographic negatives that corrects photometric and geometric distortions. Current restoration techniques for the discussed form of negatives require physical manipulation to the photograph. The novel acquisition and restoration system presented here provides the first known solution to digitize and restore deteriorated photographic negatives without damaging the original negative in any way. This thesis work develops new methods of document scanning and restoration suitable for wide-scale deployment. By creating easy to access technologies, library staff can implement their own scanning initiatives and large-scale scanning projects can expand their current document-sets

    Deep Unrestricted Document Image Rectification

    Full text link
    In recent years, tremendous efforts have been made on document image rectification, but existing advanced algorithms are limited to processing restricted document images, i.e., the input images must incorporate a complete document. Once the captured image merely involves a local text region, its rectification quality is degraded and unsatisfactory. Our previously proposed DocTr, a transformer-assisted network for document image rectification, also suffers from this limitation. In this work, we present DocTr++, a novel unified framework for document image rectification, without any restrictions on the input distorted images. Our major technical improvements can be concluded in three aspects. Firstly, we upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing. Secondly, we reformulate the pixel-wise mapping relationship between the unrestricted distorted document images and the distortion-free counterparts. The obtained data is used to train our DocTr++ for unrestricted document image rectification. Thirdly, we contribute a real-world test set and metrics applicable for evaluating the rectification quality. To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images. Extensive experiments are conducted, and the results demonstrate the effectiveness and superiority of our method. We hope our DocTr++ will serve as a strong baseline for generic document image rectification, prompting the further advancement and application of learning-based algorithms. The source code and the proposed dataset are publicly available at https://github.com/fh2019ustc/DocTr-Plus

    Reverse-engineering of architectural buildings based on an hybrid modeling approach

    Get PDF
    We thank MENSI and REALVIZ companies for their helpful comments and the following people for providing us images from their works: Francesca De Domenico (Fig. 1), Kyung-Tae Kim (Fig. 9). The CMN (French national center of patrimony buildings) is also acknowledged for the opportunity given to demonstrate our approach on the Hotel de Sully in Paris. We thank Tudor Driscu for his help on the English translation.This article presents a set of theoretical reflections and technical demonstrations that constitute a new methodological base for the architectural surveying and representation using computer graphics techniques. The problem we treated relates to three distinct concerns: the surveying of architectural objects, the construction and the semantic enrichment of their geometrical models, and their handling for the extraction of dimensional information. A hybrid approach to 3D reconstruction is described. This new approach combines range-based modeling and image-based modeling techniques; it integrates the concept of architectural feature-based modeling. To develop this concept set up a first process of extraction and formalization of architectural knowledge based on the analysis of architectural treaties is carried on. Then, the identified features are used to produce a template shape library. Finally the problem of the overall model structure and organization is addressed

    DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

    Full text link
    In this work, we propose a new framework, called Document Image Transformer (DocTr), to address the issue of geometry and illumination distortion of the document images. Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer. By setting a set of learned query embedding, the geometric unwarping transformer captures the global context of the document image by self-attention mechanism and decodes the pixel-wise displacement solution to correct the geometric distortion. After geometric unwarping, our illumination correction transformer further removes the shading artifacts to improve the visual quality and OCR accuracy. Extensive evaluations are conducted on several datasets, and superior results are reported against the state-of-the-art methods. Remarkably, our DocTr achieves 20.02% Character Error Rate (CER), a 15% absolute improvement over the state-of-the-art methods. Moreover, it also shows high efficiency on running time and parameter count. The results will be available at https://github.com/fh2019ustc/DocTr for further comparison.Comment: This paper has been accepted by ACM Multimedia 202
    • …
    corecore