32 research outputs found

    Deep Unrestricted Document Image Rectification

    Full text link
    In recent years, tremendous efforts have been made on document image rectification, but existing advanced algorithms are limited to processing restricted document images, i.e., the input images must incorporate a complete document. Once the captured image merely involves a local text region, its rectification quality is degraded and unsatisfactory. Our previously proposed DocTr, a transformer-assisted network for document image rectification, also suffers from this limitation. In this work, we present DocTr++, a novel unified framework for document image rectification, without any restrictions on the input distorted images. Our major technical improvements can be concluded in three aspects. Firstly, we upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing. Secondly, we reformulate the pixel-wise mapping relationship between the unrestricted distorted document images and the distortion-free counterparts. The obtained data is used to train our DocTr++ for unrestricted document image rectification. Thirdly, we contribute a real-world test set and metrics applicable for evaluating the rectification quality. To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images. Extensive experiments are conducted, and the results demonstrate the effectiveness and superiority of our method. We hope our DocTr++ will serve as a strong baseline for generic document image rectification, prompting the further advancement and application of learning-based algorithms. The source code and the proposed dataset are publicly available at https://github.com/fh2019ustc/DocTr-Plus

    MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

    Full text link
    Document dewarping from a distorted camera-captured image is of great value for OCR and document understanding. The document boundary plays an important role which is more evident than the inner region in document dewarping. Current learning-based methods mainly focus on complete boundary cases, leading to poor document correction performance of documents with incomplete boundaries. In contrast to these methods, this paper proposes MataDoc, the first method focusing on arbitrary boundary document dewarping with margin and text aware regularizations. Specifically, we design the margin regularization by explicitly considering background consistency to enhance boundary perception. Moreover, we introduce word position consistency to keep text lines straight in rectified document images. To produce a comprehensive evaluation of MataDoc, we propose a novel benchmark ArbDoc, mainly consisting of document images with arbitrary boundaries in four typical scenarios. Extensive experiments confirm the superiority of MataDoc with consideration for the incomplete boundary on ArbDoc and also demonstrate the effectiveness of the proposed method on DocUNet, DIR300, and WarpDoc datasets.Comment: 12 page

    DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

    Full text link
    In this work, we propose a new framework, called Document Image Transformer (DocTr), to address the issue of geometry and illumination distortion of the document images. Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer. By setting a set of learned query embedding, the geometric unwarping transformer captures the global context of the document image by self-attention mechanism and decodes the pixel-wise displacement solution to correct the geometric distortion. After geometric unwarping, our illumination correction transformer further removes the shading artifacts to improve the visual quality and OCR accuracy. Extensive evaluations are conducted on several datasets, and superior results are reported against the state-of-the-art methods. Remarkably, our DocTr achieves 20.02% Character Error Rate (CER), a 15% absolute improvement over the state-of-the-art methods. Moreover, it also shows high efficiency on running time and parameter count. The results will be available at https://github.com/fh2019ustc/DocTr for further comparison.Comment: This paper has been accepted by ACM Multimedia 202

    Automatic Detection and Rectification of Paper Receipts on Smartphones

    Full text link
    We describe the development of a real-time smartphone app that allows the user to digitize paper receipts in a novel way by "waving" their phone over the receipts and letting the app automatically detect and rectify the receipts for subsequent text recognition. We show that traditional computer vision algorithms for edge and corner detection do not robustly detect the non-linear and discontinuous edges and corners of a typical paper receipt in real-world settings. This is particularly the case when the colors of the receipt and background are similar, or where other interfering rectangular objects are present. Inaccurate detection of a receipt's corner positions then results in distorted images when using an affine projective transformation to rectify the perspective. We propose an innovative solution to receipt corner detection by treating each of the four corners as a unique "object", and training a Single Shot Detection MobileNet object detection model. We use a small amount of real data and a large amount of automatically generated synthetic data that is designed to be similar to real-world imaging scenarios. We show that our proposed method robustly detects the four corners of a receipt, giving a receipt detection accuracy of 85.3% on real-world data, compared to only 36.9% with a traditional edge detection-based approach. Our method works even when the color of the receipt is virtually indistinguishable from the background. Moreover, our method is trained to detect only the corners of the central target receipt and implicitly learns to ignore other receipts, and other rectangular objects. Including synthetic data allows us to train an even better model. These factors are a major advantage over traditional edge detection-based approaches, allowing us to deliver a much better experience to the user

    Вирівнювання зображень розлінованих аркушів

    Get PDF
    Робота складається з 4 розділів, містить 23 ілюстрації, 6 таблиць, 21 літературних посилань, обсяг роботи — 65 сторінок. Сучасне навчання поширює процес передачі інформації за допомогою знімків аркушів, що потребують швидкої обробки для комфортного візуального сприйняття. Частина зображень конспектів та письмових контрольних завдань зроблена камерою пересічного смартфона за слабкого освітлення та під несприятливим для зору кутом, тому потребує застосування методів покращення їх якості. Оскільки тема вирівнювання зображень не є новою, наразі існує досить багато комерційних застосунків, котрі дають змогу користувачам вирівнювати фотографії документів і робити їх цифрові скан-копії. Однак більша їх частина орієнтована на пошук аркушу паперу за спеціальних умов зйомки. У цій роботі пропонується метод вирівнювання зображень розлінованих у клітинку або косу лінію аркушів, що може бути доповненням до вже існуючих інструментів. Мета даної роботи — розробити метод вирівнювання зображень аркушів, які мають розмітку в клітинку або у косу лінію. Об’єкт дослідження — зображення розлінованих аркушів паперу. Предмет дослідження — вирівнювання зображень аркушів паперу. Методами дослідження є аналіз інформаційних джерел, новітніх публікацій за темою дослідження, дослідження методів обробки зображень та правил проективної геометрії. Актуальність роботи зумовлюється тим, що можливості редагування зображень документів є популярними та потрібними у повсякденному житті, зокрема у сучасній шкільній та університетській освіті, невід’ємною частиною якого є процес передачі інформації за допомогою знімків аркушів. Наукова новизна роботи полягає в тому, що запропоновано алгоритм вирівнювання аркушів, що мають розмітку. Алгоритми такої спеціалізації нам не відомі, так як інші у своїй роботі використовують границі аркушів або текст. Практичне застосування полягає в тому, що даний підхід дозволяє вирівнювати зображення, які не задовольняють умовам зйомки, що вимагають існуючі методи, що є їх якісним доповненням. Роботу оформлено за вимогами та подано на публікацію до міжнародного науково-теоретичного журналу “Кібернетика і системний аналіз”.The work consists of 4 sections, 23 illustrations, 6 tables, and 21 references, and the volume of the work is 65 pages. Modern education spreads the process of transferring information using images of sheets that require fast processing for comfortable visual perception. Some of the images of notes and written control tasks were taken by the camera of a general smartphone in low light and at an inconvenient angle for vision. Therefore, it is necessary to use methods to improve their quality. Since image alignment is not a new topic, currently, there are quite a lot of commercial applications that allow users to rectify photos of documents and make their digital scans. However, most focus on finding a sheet of paper under special shooting conditions. This work proposes a method for aligning images of sheets lined in a cell or diagonal line, which can be an extension of already existing tools. This work aims to develop a method for rectifying images of sheets with a grid or diagonal line marking. The object of research is images of lined sheets of paper. The subject of research is alignment of images of sheets of paper. The research methods are an analysis of information sources and the latest publications on the research topic, study of image processing methods and rules of projective geometry. The relevance of the work is determined by the fact that the ability to edit images of documents is prevalent and necessary in everyday life, particularly in modern school and university education, an integral part of which is the process of transmitting information using images of sheets. The scientific novelty of the work is that an algorithm for aligning sheets with markings is proposed. We do not know algorithms for such specialization, as others use sheet borders or text in their work. The practical application is that this approach allows you to rectify images that do not meet the shooting conditions required by existing methods, which could be their qualitative addition. The work has been designed according to the requirements and submitted for publication to the international scientific and theoretical journal "Cybernetics and System Analysis"

    정렬 특성들 기반의 문서 및 장면 텍스트 영상 평활화 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 조남익.카메라로 촬영한 텍스트 영상에 대해서, 광학 문자 인식(OCR)은 촬영된 장면을 분석하는데 있어서 매우 중요하다. 하지만 올바른 텍스트 영역 검출 후에도, 촬영한 영상에 대한 문자 인식은 여전히 어려운 문제로 여겨진다. 이는 종이의 구부러짐과 카메라 시점에 의한 기하학적인 왜곡 때문이고, 따라서 이러한 텍스트 영상에 대한 평활화는 문자 인식에 있어서 필수적인 전처리 과정으로 여겨진다. 이를 위한 왜곡된 촬영 영상을 정면 시점으로 복원하는 텍스트 영상 평활화 방법들은 활발히 연구되어지고 있다. 최근에는, 평활화가 잘 된 텍스트의 성질에 초점을 맞춘 연구들이 주로 진행되고 있다. 이러한 관점에서, 본 학위 논문은 텍스트 영상 평활화를 위하여 새로운 정렬 특성들을 다룬다. 이러한 정렬 특성들은 비용 함수로 설계되어지고, 비용 함수를 최소화하는 방법을 통해서 평활화에 사용되어지는 평활화 변수들이 구해진다. 본 학위 논문은 문서 영상 평활화, 장면 텍스트 평활화, 일반 배경 속의 휘어진 표면 평활화와 같이 3가지 세부 주제로 나눠진다. 첫 번째로, 본 학위 논문은 텍스트 라인들과 선분들의 정렬 특성에 기반의 문서 영상 평활화 방법을 제안한다. 기존의 텍스트 라인 기반의 문서 영상 평활화 방법들의 경우, 문서가 복잡한 레이아웃 형태이거나 적은 수의 텍스트 라인을 포함하고 있을 때 문제가 발생한다. 이는 문서에 텍스트 대신 그림, 그래프 혹은 표와 같은 영역이 많은 경우이다. 따라서 레이아웃에 강인한 문서 영상 평활화를 위하여 제안하는 방법은 정렬된 텍스트 라인뿐만 아니라 선분들도 이용한다. 올바르게 평활화 된 선분들은 여전히 일직선의 형태이고, 대부분 가로 혹은 세로 방향으로 정렬되어 있다는 가정 및 관측에 근거하여, 제안하는 방법은 이러한 성질들을 수식화하고 이를 텍스트 라인 기반의 비용 함수와 결합한다. 그리고 비용 함수를 최소화 하는 방법을 통해, 제안하는 방법은 종이의 구부러짐, 카메라 시점, 초점 거리와 같은 평활화 변수들을 추정한다. 또한, 오검출된 텍스트 라인들과 임의의 방향을 가지는 선분들과 같은 이상점(outlier)을 고려하여, 제안하는 방법은 반복적인 단계로 설계된다. 각 단계에서, 정렬 특성을 만족하지 않는 이상점들은 제거되고, 제거되지 않은 텍스트 라인 및 선분들만이 비용함수 최적화에 이용된다. 수행한 실험 결과들은 제안하는 방법이 다양한 레이아웃에 대하여 강인함을 보여준다. 두 번째로는, 본 논문은 장면 텍스트 평활화 방법을 제안한다. 기존 장면 텍스트 평활화 방법들의 경우, 가로/세로 방향의 획, 대칭 형태와 같은 문자가 가지는 고유의 생김새에 관련된 특성을 이용한다. 하지만, 이러한 방법들은 문자들의 정렬 형태는 고려하지 않고, 각각 개별 문자에 대한 특성들만을 이용하기 때문에 여러 문자들로 구성된 텍스트에 대해서 잘 정렬되지 않은 결과를 출력한다. 이러한 문제점을 해결하기 위하여, 제안하는 방법은 문자들의 정렬 정보를 이용한다. 정확하게는, 문자 고유의 모양뿐만 아니라 정렬 특성들도 함께 비용함수로 수식화되고, 비용함수를 최소화하는 방법을 통해서 평활화가 진행된다. 또한, 문자들의 정렬 특성을 수식화하기 위하여, 제안하는 방법은 텍스트를 각각 개별 문자들로 분리하는 문자 분리 또한 수행한다. 그 뒤, 텍스트의 위, 아래 선들을 RANSAC 알고리즘을 이용한 최소 제곱법을 통해 추정한다. 즉, 전체 알고리즘은 문자 분리와 선 추정, 평활화가 반복적으로 수행된다. 제안하는 비용함수는 볼록(convex)형태가 아니고 또한 많은 변수들을 포함하고 있기 때문에, 이를 최적화하기 위하여 Augmented Lagrange Multiplier 방법을 이용한다. 제안하는 방법은 일반 촬영 영상과 합성된 텍스트 영상을 통해 실험이 진행되었고, 실험 결과들은 제안하는 방법이 기존 방법들에 비하여 높은 인식 성능을 보이면서 동시에 시각적으로도 좋은 결과를 보임을 보여준다. 마지막으로, 제안하는 방법은 일반 배경 속의 휘어진 표면 평활화 방법으로도 확장된다. 일반 배경에 대해서, 약병이나 음료수 캔과 같이 원통 형태의 물체는 많이 존재한다. 그들의 표면은 일반 원통 표면(GCS)으로 모델링이 가능하다. 이러한 휘어진 표면들은 많은 문자와 그림들을 포함하고 있지만, 포함된 문자는 문서에 비해서 매우 불규칙적인 구조를 가지고 있다. 따라서 기존의 문서 영상 평활화 방법들로는 일반 배경 속 휘어진 표면 영상을 평활화하기 힘들다. 많은 휘어진 표면은 잘 정렬된 선분들 (테두리 선 혹은 바코드)을 포함하고 있다는 관측에 근거하여, 제안하는 방법은 앞서 제안한 선분들에 대한 함수를 이용하여 휘어진 표면을 평활화한다. 다양한 둥근 물체의 휘어진 표면 영상들에 대한 실험 결과들은 제안하는 방법이 평활화를 정확하게 수행함을 보여준다.The optical character recognition (OCR) of text images captured by cameras plays an important role for scene understanding. However, the OCR of camera-captured image is still considered a challenging problem, even after the text detection (localization). It is mainly due to the geometric distortions caused by page curve and perspective view, therefore their rectification has been an essential pre-processing step for their recognition. Thus, there have been many text image rectification methods which recover the fronto-parallel view image from a single distorted image. Recently, many researchers have focused on the properties of the well-rectified text. In this respect, this dissertation presents novel alignment properties for text image rectification, which are encoded into the proposed cost functions. By minimizing the cost functions, the transformation parameters for rectification are obtained. In detail, they are applied to three topics: document image dewarping, scene text rectification, and curved surface dewarping in real scene. First, a document image dewarping method is proposed based on the alignments of text-lines and line segments. Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, the proposed method uses line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, the proposed method encodes this properties into the cost function in addition to the text-line based cost. By minimizing the function, the proposed method can obtain transformation parameters for page curve, camera pose, and focal length, which are used for document image rectification. Considering that there are many outliers in line segment directions and miss-detected text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, the proposed method removes the text-lines and line segments that are not well aligned, and then minimizes the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. This dissertation also presents a method for scene text rectification. Conventional methods for scene text rectification mainly exploited the glyph property, which means that the characters in many language have horizontal/vertical strokes and also some symmetric shapes. However, since they consider the only shape properties of individual character, without considering the alignments of characters, they work well for only images with a single character, and still yield mis-aligned results for images with multiple characters. In order to alleviate this problem, the proposed method explicitly imposes alignment constraints on rectified results. To be precise, character alignments as well as glyph properties are encoded in the proposed cost function, and the transformation parameters are obtained by minimizing the function. Also, in order to encode the alignments of characters into the cost function, the proposed method separates the text into individual characters using a projection profile method before optimizing the cost function. Then, top and bottom lines are estimated using a least squares line fitting with RANSAC. Overall algorithm is designed to perform character segmentation, line fitting, and rectification iteratively. Since the cost function is non-convex and many variables are involved in the function, the proposed method also develops an optimization method using Augmented Lagrange Multiplier method. This dissertation evaluates the proposed method on real and synthetic text images and experimental results show that the proposed method achieves higher OCR accuracy than the conventional approach and also yields visually pleasing results. Finally, the proposed method can be extended to the curved surface dewarping in real scene. In real scene, there are many circular objects such as medicine bottles or cans of drinking water, and their curved surfaces can be modeled as Generalized Cylindrical Surfaces (GCS). These curved surfaces include many significant text and figures, however their text has irregular structure compared to documents. Therefore, the conventional dewarping methods based on the properties of well-rectified text have problems in their rectification. Based on the observation that many curved surfaces include well-aligned line segments (boundary lines of objects or barcode), the proposed method rectifies the curved surfaces by exploiting the proposed line segment terms. Experimental results on a range of images with curved surfaces of circular objects show that the proposed method performs rectification robustly.1 Introduction 1 1.1 Document image dewarping 3 1.2 Scene text rectification 5 1.3 Curved surface dewarping in real scene 7 1.4 Contents 8 2 Related work 9 2.1 Document image dewarping 9 2.1.1 Dewarping methods using additional information 9 2.1.2 Text-line based dewarping methods 10 2.2 Scene text rectification 11 2.3 Curved surface dewarping in real scene 12 3 Document image dewarping 15 3.1 Proposed cost function 15 3.1.1 Parametric model of dewarping process 15 3.1.2 Cost function design 18 3.1.3 Line segment properties and cost function 19 3.2 Outlier removal and optimization 26 3.2.1 Jacobian matrix of the proposed cost function 27 3.3 Document region detection and dewarping 31 3.4 Experimental results 32 3.4.1 Experimental results on text-abundant document images 33 3.4.2 Experimental results on non conventional document images 34 3.5 Summary 47 4 Scene text rectification 49 4.1 Proposed cost function for rectification 49 4.1.1 Cost function design 49 4.1.2 Character alignment properties and alignment terms 51 4.2 Overall algorithm 54 4.2.1 Initialization 55 4.2.2 Character segmentation 56 4.2.3 Estimation of the alignment parameters 57 4.2.4 Cost function optimization for rectification 58 4.3 Experimental results 63 4.4 Summary 66 5 Curved surface dewarping in real scene 73 5.1 Proposed curved surface dewarping method 73 5.1.1 Pre-processing 73 5.1 Experimental results 74 5.2 Summary 76 6 Conclusions 83 Bibliography 85 Abstract (Korean) 93Docto
    corecore