567 research outputs found
Deep Unrestricted Document Image Rectification
In recent years, tremendous efforts have been made on document image
rectification, but existing advanced algorithms are limited to processing
restricted document images, i.e., the input images must incorporate a complete
document. Once the captured image merely involves a local text region, its
rectification quality is degraded and unsatisfactory. Our previously proposed
DocTr, a transformer-assisted network for document image rectification, also
suffers from this limitation. In this work, we present DocTr++, a novel unified
framework for document image rectification, without any restrictions on the
input distorted images. Our major technical improvements can be concluded in
three aspects. Firstly, we upgrade the original architecture by adopting a
hierarchical encoder-decoder structure for multi-scale representation
extraction and parsing. Secondly, we reformulate the pixel-wise mapping
relationship between the unrestricted distorted document images and the
distortion-free counterparts. The obtained data is used to train our DocTr++
for unrestricted document image rectification. Thirdly, we contribute a
real-world test set and metrics applicable for evaluating the rectification
quality. To our best knowledge, this is the first learning-based method for the
rectification of unrestricted document images. Extensive experiments are
conducted, and the results demonstrate the effectiveness and superiority of our
method. We hope our DocTr++ will serve as a strong baseline for generic
document image rectification, prompting the further advancement and application
of learning-based algorithms. The source code and the proposed dataset are
publicly available at https://github.com/fh2019ustc/DocTr-Plus
DocScanner: Robust Document Image Rectification with Progressive Learning
Compared with flatbed scanners, portable smartphones are much more convenient
for physical documents digitizing. However, such digitized documents are often
distorted due to uncontrolled physical deformations, camera positions, and
illumination variations. To this end, we present DocScanner, a novel framework
for document image rectification. Different from existing methods, DocScanner
addresses this issue by introducing a progressive learning mechanism.
Specifically, DocScanner maintains a single estimate of the rectified image,
which is progressively corrected with a recurrent architecture. The iterative
refinements make DocScanner converge to a robust and superior performance,
while the lightweight recurrent architecture ensures the running efficiency. In
addition, before the above rectification process, observing the corrupted
rectified boundaries existing in prior works, DocScanner exploits a document
localization module to explicitly segment the foreground document from the
cluttered background environments. To further improve the rectification
quality, based on the geometric priori between the distorted and the rectified
images, a geometric regularization is introduced during training to further
improve the performance. Extensive experiments are conducted on the Doc3D
dataset and the DocUNet Benchmark dataset, and the quantitative and qualitative
evaluation results verify the effectiveness of DocScanner, which outperforms
previous methods on OCR accuracy, image similarity, and our proposed distortion
metric by a considerable margin. Furthermore, our DocScanner shows the highest
efficiency in runtime latency and model size
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Document dewarping from a distorted camera-captured image is of great value
for OCR and document understanding. The document boundary plays an important
role which is more evident than the inner region in document dewarping. Current
learning-based methods mainly focus on complete boundary cases, leading to poor
document correction performance of documents with incomplete boundaries. In
contrast to these methods, this paper proposes MataDoc, the first method
focusing on arbitrary boundary document dewarping with margin and text aware
regularizations. Specifically, we design the margin regularization by
explicitly considering background consistency to enhance boundary perception.
Moreover, we introduce word position consistency to keep text lines straight in
rectified document images. To produce a comprehensive evaluation of MataDoc, we
propose a novel benchmark ArbDoc, mainly consisting of document images with
arbitrary boundaries in four typical scenarios. Extensive experiments confirm
the superiority of MataDoc with consideration for the incomplete boundary on
ArbDoc and also demonstrate the effectiveness of the proposed method on
DocUNet, DIR300, and WarpDoc datasets.Comment: 12 page
μ λ ¬ νΉμ±λ€ κΈ°λ°μ λ¬Έμ λ° μ₯λ©΄ ν μ€νΈ μμ ννν κΈ°λ²
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2017. 8. μ‘°λ¨μ΅.μΉ΄λ©λΌλ‘ 촬μν ν
μ€νΈ μμμ λν΄μ, κ΄ν λ¬Έμ μΈμ(OCR)μ 촬μλ μ₯λ©΄μ λΆμνλλ° μμ΄μ λ§€μ° μ€μνλ€. νμ§λ§ μ¬λ°λ₯Έ ν
μ€νΈ μμ κ²μΆ νμλ, 촬μν μμμ λν λ¬Έμ μΈμμ μ¬μ ν μ΄λ €μ΄ λ¬Έμ λ‘ μ¬κ²¨μ§λ€. μ΄λ μ’
μ΄μ ꡬλΆλ¬μ§κ³Ό μΉ΄λ©λΌ μμ μ μν κΈ°ννμ μΈ μ곑 λλ¬Έμ΄κ³ , λ°λΌμ μ΄λ¬ν ν
μ€νΈ μμμ λν νννλ λ¬Έμ μΈμμ μμ΄μ νμμ μΈ μ μ²λ¦¬ κ³Όμ μΌλ‘ μ¬κ²¨μ§λ€. μ΄λ₯Ό μν μ곑λ 촬μ μμμ μ λ©΄ μμ μΌλ‘ 볡μνλ ν
μ€νΈ μμ ννν λ°©λ²λ€μ νλ°ν μ°κ΅¬λμ΄μ§κ³ μλ€. μ΅κ·Όμλ, νννκ° μ λ ν
μ€νΈμ μ±μ§μ μ΄μ μ λ§μΆ μ°κ΅¬λ€μ΄ μ£Όλ‘ μ§νλκ³ μλ€. μ΄λ¬ν κ΄μ μμ, λ³Έ νμ λ
Όλ¬Έμ ν
μ€νΈ μμ νννλ₯Ό μνμ¬ μλ‘μ΄ μ λ ¬ νΉμ±λ€μ λ€λ£¬λ€. μ΄λ¬ν μ λ ¬ νΉμ±λ€μ λΉμ© ν¨μλ‘ μ€κ³λμ΄μ§κ³ , λΉμ© ν¨μλ₯Ό μ΅μννλ λ°©λ²μ ν΅ν΄μ νννμ μ¬μ©λμ΄μ§λ ννν λ³μλ€μ΄ ꡬν΄μ§λ€. λ³Έ νμ λ
Όλ¬Έμ λ¬Έμ μμ ννν, μ₯λ©΄ ν
μ€νΈ ννν, μΌλ° λ°°κ²½ μμ νμ΄μ§ νλ©΄ νννμ κ°μ΄ 3κ°μ§ μΈλΆ μ£Όμ λ‘ λλ μ§λ€.
첫 λ²μ§Έλ‘, λ³Έ νμ λ
Όλ¬Έμ ν
μ€νΈ λΌμΈλ€κ³Ό μ λΆλ€μ μ λ ¬ νΉμ±μ κΈ°λ°μ λ¬Έμ μμ ννν λ°©λ²μ μ μνλ€. κΈ°μ‘΄μ ν
μ€νΈ λΌμΈ κΈ°λ°μ λ¬Έμ μμ ννν λ°©λ²λ€μ κ²½μ°, λ¬Έμκ° λ³΅μ‘ν λ μ΄μμ ννμ΄κ±°λ μ μ μμ ν
μ€νΈ λΌμΈμ ν¬ν¨νκ³ μμ λ λ¬Έμ κ° λ°μνλ€. μ΄λ λ¬Έμμ ν
μ€νΈ λμ κ·Έλ¦Ό, κ·Έλν νΉμ νμ κ°μ μμμ΄ λ§μ κ²½μ°μ΄λ€. λ°λΌμ λ μ΄μμμ κ°μΈν λ¬Έμ μμ νννλ₯Ό μνμ¬ μ μνλ λ°©λ²μ μ λ ¬λ ν
μ€νΈ λΌμΈλΏλ§ μλλΌ μ λΆλ€λ μ΄μ©νλ€. μ¬λ°λ₯΄κ² ννν λ μ λΆλ€μ μ¬μ ν μΌμ§μ μ ννμ΄κ³ , λλΆλΆ κ°λ‘ νΉμ μΈλ‘ λ°©ν₯μΌλ‘ μ λ ¬λμ΄ μλ€λ κ°μ λ° κ΄μΈ‘μ κ·Όκ±°νμ¬, μ μνλ λ°©λ²μ μ΄λ¬ν μ±μ§λ€μ μμννκ³ μ΄λ₯Ό ν
μ€νΈ λΌμΈ κΈ°λ°μ λΉμ© ν¨μμ κ²°ν©νλ€. κ·Έλ¦¬κ³ λΉμ© ν¨μλ₯Ό μ΅μν νλ λ°©λ²μ ν΅ν΄, μ μνλ λ°©λ²μ μ’
μ΄μ ꡬλΆλ¬μ§, μΉ΄λ©λΌ μμ , μ΄μ 거리μ κ°μ ννν λ³μλ€μ μΆμ νλ€. λν, μ€κ²μΆλ ν
μ€νΈ λΌμΈλ€κ³Ό μμμ λ°©ν₯μ κ°μ§λ μ λΆλ€κ³Ό κ°μ μ΄μμ (outlier)μ κ³ λ €νμ¬, μ μνλ λ°©λ²μ λ°λ³΅μ μΈ λ¨κ³λ‘ μ€κ³λλ€. κ° λ¨κ³μμ, μ λ ¬ νΉμ±μ λ§μ‘±νμ§ μλ μ΄μμ λ€μ μ κ±°λκ³ , μ κ±°λμ§ μμ ν
μ€νΈ λΌμΈ λ° μ λΆλ€λ§μ΄ λΉμ©ν¨μ μ΅μ νμ μ΄μ©λλ€. μνν μ€ν κ²°κ³Όλ€μ μ μνλ λ°©λ²μ΄ λ€μν λ μ΄μμμ λνμ¬ κ°μΈν¨μ 보μ¬μ€λ€.
λ λ²μ§Έλ‘λ, λ³Έ λ
Όλ¬Έμ μ₯λ©΄ ν
μ€νΈ ννν λ°©λ²μ μ μνλ€. κΈ°μ‘΄ μ₯λ©΄ ν
μ€νΈ ννν λ°©λ²λ€μ κ²½μ°, κ°λ‘/μΈλ‘ λ°©ν₯μ ν, λμΉ ννμ κ°μ λ¬Έμκ° κ°μ§λ κ³ μ μ μκΉμμ κ΄λ ¨λ νΉμ±μ μ΄μ©νλ€. νμ§λ§, μ΄λ¬ν λ°©λ²λ€μ λ¬Έμλ€μ μ λ ¬ ννλ κ³ λ €νμ§ μκ³ , κ°κ° κ°λ³ λ¬Έμμ λν νΉμ±λ€λ§μ μ΄μ©νκΈ° λλ¬Έμ μ¬λ¬ λ¬Έμλ€λ‘ ꡬμ±λ ν
μ€νΈμ λν΄μ μ μ λ ¬λμ§ μμ κ²°κ³Όλ₯Ό μΆλ ₯νλ€. μ΄λ¬ν λ¬Έμ μ μ ν΄κ²°νκΈ° μνμ¬, μ μνλ λ°©λ²μ λ¬Έμλ€μ μ λ ¬ μ 보λ₯Ό μ΄μ©νλ€. μ ννκ²λ, λ¬Έμ κ³ μ μ λͺ¨μλΏλ§ μλλΌ μ λ ¬ νΉμ±λ€λ ν¨κ» λΉμ©ν¨μλ‘ μμνλκ³ , λΉμ©ν¨μλ₯Ό μ΅μννλ λ°©λ²μ ν΅ν΄μ νννκ° μ§νλλ€. λν, λ¬Έμλ€μ μ λ ¬ νΉμ±μ μμννκΈ° μνμ¬, μ μνλ λ°©λ²μ ν
μ€νΈλ₯Ό κ°κ° κ°λ³ λ¬Έμλ€λ‘ λΆλ¦¬νλ λ¬Έμ λΆλ¦¬ λν μννλ€. κ·Έ λ€, ν
μ€νΈμ μ, μλ μ λ€μ RANSAC μκ³ λ¦¬μ¦μ μ΄μ©ν μ΅μ μ κ³±λ²μ ν΅ν΄ μΆμ νλ€. μ¦, μ 체 μκ³ λ¦¬μ¦μ λ¬Έμ λΆλ¦¬μ μ μΆμ , νννκ° λ°λ³΅μ μΌλ‘ μνλλ€. μ μνλ λΉμ©ν¨μλ λ³Όλ‘(convex)ννκ° μλκ³ λν λ§μ λ³μλ€μ ν¬ν¨νκ³ μκΈ° λλ¬Έμ, μ΄λ₯Ό μ΅μ ννκΈ° μνμ¬ Augmented Lagrange Multiplier λ°©λ²μ μ΄μ©νλ€. μ μνλ λ°©λ²μ μΌλ° 촬μ μμκ³Ό ν©μ±λ ν
μ€νΈ μμμ ν΅ν΄ μ€νμ΄ μ§νλμκ³ , μ€ν κ²°κ³Όλ€μ μ μνλ λ°©λ²μ΄ κΈ°μ‘΄ λ°©λ²λ€μ λΉνμ¬ λμ μΈμ μ±λ₯μ 보μ΄λ©΄μ λμμ μκ°μ μΌλ‘λ μ’μ κ²°κ³Όλ₯Ό 보μμ 보μ¬μ€λ€.
λ§μ§λ§μΌλ‘, μ μνλ λ°©λ²μ μΌλ° λ°°κ²½ μμ νμ΄μ§ νλ©΄ ννν λ°©λ²μΌλ‘λ νμ₯λλ€. μΌλ° λ°°κ²½μ λν΄μ, μ½λ³μ΄λ μλ£μ μΊκ³Ό κ°μ΄ μν΅ ννμ 물체λ λ§μ΄ μ‘΄μ¬νλ€. κ·Έλ€μ νλ©΄μ μΌλ° μν΅ νλ©΄(GCS)μΌλ‘ λͺ¨λΈλ§μ΄ κ°λ₯νλ€. μ΄λ¬ν νμ΄μ§ νλ©΄λ€μ λ§μ λ¬Έμμ κ·Έλ¦Όλ€μ ν¬ν¨νκ³ μμ§λ§, ν¬ν¨λ λ¬Έμλ λ¬Έμμ λΉν΄μ λ§€μ° λΆκ·μΉμ μΈ κ΅¬μ‘°λ₯Ό κ°μ§κ³ μλ€. λ°λΌμ κΈ°μ‘΄μ λ¬Έμ μμ ννν λ°©λ²λ€λ‘λ μΌλ° λ°°κ²½ μ νμ΄μ§ νλ©΄ μμμ ννννκΈ° νλ€λ€. λ§μ νμ΄μ§ νλ©΄μ μ μ λ ¬λ μ λΆλ€ (ν
λ리 μ νΉμ λ°μ½λ)μ ν¬ν¨νκ³ μλ€λ κ΄μΈ‘μ κ·Όκ±°νμ¬, μ μνλ λ°©λ²μ μμ μ μν μ λΆλ€μ λν ν¨μλ₯Ό μ΄μ©νμ¬ νμ΄μ§ νλ©΄μ ννννλ€. λ€μν λ₯κ·Ό 물체μ νμ΄μ§ νλ©΄ μμλ€μ λν μ€ν κ²°κ³Όλ€μ μ μνλ λ°©λ²μ΄ νννλ₯Ό μ ννκ² μνν¨μ 보μ¬μ€λ€.The optical character recognition (OCR) of text images captured by cameras plays an important role for scene understanding.
However, the OCR of camera-captured image is still considered a challenging problem, even after the text detection (localization).
It is mainly due to the geometric distortions caused by page curve and perspective view, therefore their rectification has been an essential pre-processing step for their recognition.
Thus, there have been many text image rectification methods which recover the fronto-parallel view image from a single distorted image.
Recently, many researchers have focused on the properties of the well-rectified text.
In this respect, this dissertation presents novel alignment properties for text image rectification, which are encoded into the proposed cost functions.
By minimizing the cost functions, the transformation parameters for rectification are obtained.
In detail, they are applied to three topics: document image dewarping, scene text rectification, and curved surface dewarping in real scene.
First, a document image dewarping method is proposed based on the alignments of text-lines and line segments.
Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead.
Hence, for the robust document dewarping, the proposed method uses line segments in the image in addition to the aligned text-lines.
Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, the proposed method encodes this properties into the cost function in addition to the text-line based cost.
By minimizing the function, the proposed method can obtain transformation parameters for page curve, camera pose, and focal length, which are used for document image rectification. Considering that there are many outliers in line segment directions and miss-detected text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, the proposed method removes the text-lines and line segments that are not well aligned, and then minimizes the cost function with the updated information.
Experimental results show that the proposed method is robust to the variety of page layouts.
This dissertation also presents a method for scene text rectification. Conventional methods for scene text rectification mainly exploited the glyph property, which means that the characters in many language have horizontal/vertical strokes and also some symmetric shapes.
However, since they consider the only shape properties of individual character, without considering the alignments of characters, they work well for only images with a single character, and still yield mis-aligned results for images with multiple characters.
In order to alleviate this problem, the proposed method explicitly imposes alignment constraints on rectified results. To be precise, character alignments as well as glyph properties are encoded in the proposed cost function, and the transformation parameters are obtained by minimizing the function.
Also, in order to encode the alignments of characters into the cost function, the proposed method separates the text into individual characters using a projection profile method before optimizing the cost function. Then, top and bottom lines are estimated using a least squares line fitting with RANSAC. Overall algorithm is designed to perform character segmentation, line fitting, and rectification iteratively.
Since the cost function is non-convex and many variables are involved in the function, the proposed method also develops an optimization method using Augmented Lagrange Multiplier method.
This dissertation evaluates the proposed method on real and synthetic text images and experimental results show that the proposed method achieves higher OCR accuracy than the conventional approach and also yields visually pleasing results.
Finally, the proposed method can be extended to the curved surface dewarping in real scene.
In real scene, there are many circular objects such as medicine bottles or cans of drinking water, and their curved surfaces can be modeled as Generalized Cylindrical Surfaces (GCS). These curved surfaces include many significant text and figures, however their text has irregular structure compared to documents. Therefore, the conventional dewarping methods based on the properties of well-rectified text have problems in their rectification.
Based on the observation that many curved surfaces include well-aligned line segments (boundary lines of objects or barcode), the proposed method rectifies the curved surfaces by exploiting the proposed line segment terms.
Experimental results on a range of images with curved surfaces of circular objects show that the proposed method performs rectification robustly.1 Introduction 1
1.1 Document image dewarping 3
1.2 Scene text rectification 5
1.3 Curved surface dewarping in real scene 7
1.4 Contents 8
2 Related work 9
2.1 Document image dewarping 9
2.1.1 Dewarping methods using additional information 9
2.1.2 Text-line based dewarping methods 10
2.2 Scene text rectification 11
2.3 Curved surface dewarping in real scene 12
3 Document image dewarping 15
3.1 Proposed cost function 15
3.1.1 Parametric model of dewarping process 15
3.1.2 Cost function design 18
3.1.3 Line segment properties and cost function 19
3.2 Outlier removal and optimization 26
3.2.1 Jacobian matrix of the proposed cost function 27
3.3 Document region detection and dewarping 31
3.4 Experimental results 32
3.4.1 Experimental results on text-abundant document images 33
3.4.2 Experimental results on non conventional document images 34
3.5 Summary 47
4 Scene text rectification 49
4.1 Proposed cost function for rectification 49
4.1.1 Cost function design 49
4.1.2 Character alignment properties and alignment terms 51
4.2 Overall algorithm 54
4.2.1 Initialization 55
4.2.2 Character segmentation 56
4.2.3 Estimation of the alignment parameters 57
4.2.4 Cost function optimization for rectification 58
4.3 Experimental results 63
4.4 Summary 66
5 Curved surface dewarping in real scene 73
5.1 Proposed curved surface dewarping method 73
5.1.1 Pre-processing 73
5.1 Experimental results 74
5.2 Summary 76
6 Conclusions 83
Bibliography 85
Abstract (Korean) 93Docto
NASA Tech Briefs, October 2003
Topics covered include: Cryogenic Temperature-Gradient Foam/Substrate Tensile Tester; Flight Test of an Intelligent Flight-Control System; Slat Heater Boxes for Thermal Vacuum Testing; System for Testing Thermal Insulation of Pipes; Electrical-Impedance-Based Ice-Thickness Gauges; Simulation System for Training in Laparoscopic Surgery; Flasher Powered by Photovoltaic Cells and Ultracapacitors; Improved Autoassociative Neural Networks; Toroidal-Core Microinductors Biased by Permanent Magnets; Using Correlated Photons to Suppress Background Noise; Atmospheric-Fade-Tolerant Tracking and Pointing in Wireless Optical Communication; Curved Focal-Plane Arrays Using Back-Illuminated High-Purity Photodetectors; Software for Displaying Data from Planetary Rovers; Software for Refining or Coarsening Computational Grids; Software for Diagnosis of Multiple Coordinated Spacecraft; Software Helps Retrieve Information Relevant to the User; Software for Simulating a Complex Robot; Software for Planning Scientific Activities on Mars; Software for Training in Pre-College Mathematics; Switching and Rectification in Carbon-Nanotube Junctions; Scandia-and-Yttria-Stabilized Zirconia for Thermal Barriers; Environmentally Safer, Less Toxic Fire-Extinguishing Agents; Multiaxial Temperature- and Time-Dependent Failure Model; Cloverleaf Vibratory Microgyroscope with Integrated Post; Single-Vector Calibration of Wind-Tunnel Force Balances; Microgyroscope with Vibrating Post as Rotation Transducer; Continuous Tuning and Calibration of Vibratory Gyroscopes; Compact, Pneumatically Actuated Filter Shuttle; Improved Bearingless Switched-Reluctance Motor; Fluorescent Quantum Dots for Biological Labeling; Growing Three-Dimensional Corneal Tissue in a Bioreactor; Scanning Tunneling Optical Resonance Microscopy; The Micro-Arcsecond Metrology Testbed; Detecting Moving Targets by Use of Soliton Resonances; and Finite-Element Methods for Real-Time Simulation of Surgery
- β¦