2,214 research outputs found
DocScanner: Robust Document Image Rectification with Progressive Learning
Compared with flatbed scanners, portable smartphones are much more convenient
for physical documents digitizing. However, such digitized documents are often
distorted due to uncontrolled physical deformations, camera positions, and
illumination variations. To this end, we present DocScanner, a novel framework
for document image rectification. Different from existing methods, DocScanner
addresses this issue by introducing a progressive learning mechanism.
Specifically, DocScanner maintains a single estimate of the rectified image,
which is progressively corrected with a recurrent architecture. The iterative
refinements make DocScanner converge to a robust and superior performance,
while the lightweight recurrent architecture ensures the running efficiency. In
addition, before the above rectification process, observing the corrupted
rectified boundaries existing in prior works, DocScanner exploits a document
localization module to explicitly segment the foreground document from the
cluttered background environments. To further improve the rectification
quality, based on the geometric priori between the distorted and the rectified
images, a geometric regularization is introduced during training to further
improve the performance. Extensive experiments are conducted on the Doc3D
dataset and the DocUNet Benchmark dataset, and the quantitative and qualitative
evaluation results verify the effectiveness of DocScanner, which outperforms
previous methods on OCR accuracy, image similarity, and our proposed distortion
metric by a considerable margin. Furthermore, our DocScanner shows the highest
efficiency in runtime latency and model size
Detection and Rectification of Arbitrary Shaped Scene Texts by using Text Keypoints and Links
Detection and recognition of scene texts of arbitrary shapes remain a grand
challenge due to the super-rich text shape variation in text line orientations,
lengths, curvatures, etc. This paper presents a mask-guided multi-task network
that detects and rectifies scene texts of arbitrary shapes reliably. Three
types of keypoints are detected which specify the centre line and so the shape
of text instances accurately. In addition, four types of keypoint links are
detected of which the horizontal links associate the detected keypoints of each
text instance and the vertical links predict a pair of landmark points (for
each keypoint) along the upper and lower text boundary, respectively. Scene
texts can be located and rectified by linking up the associated landmark points
(giving localization polygon boxes) and transforming the polygon boxes via thin
plate spline, respectively. Extensive experiments over several public datasets
show that the use of text keypoints is tolerant to the variation in text
orientations, lengths, and curvatures, and it achieves superior scene text
detection and rectification performance as compared with state-of-the-art
methods
MANGO: A Mask Attention Guided One-Stage Scene Text Spotter
Recently end-to-end scene text spotting has become a popular research topic
due to its advantages of global optimization and high maintainability in real
applications. Most methods attempt to develop various region of interest (RoI)
operations to concatenate the detection part and the sequence recognition part
into a two-stage text spotting framework. However, in such framework, the
recognition part is highly sensitive to the detected results (e.g.), the
compactness of text contours). To address this problem, in this paper, we
propose a novel Mask AttentioN Guided One-stage text spotting framework named
MANGO, in which character sequences can be directly recognized without RoI
operation. Concretely, a position-aware mask attention module is developed to
generate attention weights on each text instance and its characters. It allows
different text instances in an image to be allocated on different feature map
channels which are further grouped as a batch of instance features. Finally, a
lightweight sequence decoder is applied to generate the character sequences. It
is worth noting that MANGO inherently adapts to arbitrary-shaped text spotting
and can be trained end-to-end with only coarse position information (e.g.),
rectangular bounding box) and text annotations. Experimental results show that
the proposed method achieves competitive and even new state-of-the-art
performance on both regular and irregular text spotting benchmarks, i.e., ICDAR
2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.Comment: Accepted to AAAI2021. Code is available at
https://davar-lab.github.io/publication.html or
https://github.com/hikopensource/DAVAR-Lab-OC
- …