35 research outputs found

    Effective Geometric Restoration of Distorted Historical Document for Large-Scale Digitization

    Get PDF
    Due to storage conditions and material’s non-planar shape, geometric distortion of the 2-D content is widely present in scanned document images. Effective geometric restoration of these distorted document images considerably increases character recognition rate in large-scale digitisation. For large-scale digitisation of historical books, geometric restoration solutions expect to be accurate, generic, robust, unsupervised and reversible. However, most methods in the literature concentrate on improving restoration accuracy for specific distortion effect, but not their applicability in large-scale digitisation. This paper proposes an effective mesh based geometric restoration system, (GRLSD), for large-scale distorted historical document digitisation. In this system, an automatic mesh generation based dewarping tool is proposed to geometrically model and correct arbitrary warping historical documents. An XML based mesh recorder is proposed to record the mesh of distortion information for reversible use. A graphic user interface toolkit is designed to visually display and manually manipulate the mesh for improving geometric restoration accuracy. Experimental results show that the proposed automatic dewarping approach efficiently corrects arbitrarily warped historical documents, with an improved performance over several state-of-the-art geometric restoration methods. By using XML mesh recorder and GUI toolkit, the GRLSD system greatly aids users to flexibly monitor and correct ambiguous points of mesh for the prevention of damaging historical document images without distortions in large-scale digitalisation

    μ •λ ¬ νŠΉμ„±λ“€ 기반의 λ¬Έμ„œ 및 μž₯λ©΄ ν…μŠ€νŠΈ μ˜μƒ ν‰ν™œν™” 기법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2017. 8. 쑰남읡.μΉ΄λ©”λΌλ‘œ μ΄¬μ˜ν•œ ν…μŠ€νŠΈ μ˜μƒμ— λŒ€ν•΄μ„œ, κ΄‘ν•™ 문자 인식(OCR)은 촬영된 μž₯면을 λΆ„μ„ν•˜λŠ”λ° μžˆμ–΄μ„œ 맀우 μ€‘μš”ν•˜λ‹€. ν•˜μ§€λ§Œ μ˜¬λ°”λ₯Έ ν…μŠ€νŠΈ μ˜μ—­ κ²€μΆœ 후에도, μ΄¬μ˜ν•œ μ˜μƒμ— λŒ€ν•œ 문자 인식은 μ—¬μ „νžˆ μ–΄λ €μš΄ 문제둜 여겨진닀. μ΄λŠ” μ’…μ΄μ˜ κ΅¬λΆ€λŸ¬μ§κ³Ό 카메라 μ‹œμ μ— μ˜ν•œ κΈ°ν•˜ν•™μ μΈ μ™œκ³‘ λ•Œλ¬Έμ΄κ³ , λ”°λΌμ„œ μ΄λŸ¬ν•œ ν…μŠ€νŠΈ μ˜μƒμ— λŒ€ν•œ ν‰ν™œν™”λŠ” 문자 인식에 μžˆμ–΄μ„œ ν•„μˆ˜μ μΈ μ „μ²˜λ¦¬ κ³Όμ •μœΌλ‘œ 여겨진닀. 이λ₯Ό μœ„ν•œ μ™œκ³‘λœ 촬영 μ˜μƒμ„ μ •λ©΄ μ‹œμ μœΌλ‘œ λ³΅μ›ν•˜λŠ” ν…μŠ€νŠΈ μ˜μƒ ν‰ν™œν™” 방법듀은 ν™œλ°œνžˆ μ—°κ΅¬λ˜μ–΄μ§€κ³  μžˆλ‹€. μ΅œκ·Όμ—λŠ”, ν‰ν™œν™”κ°€ 잘 된 ν…μŠ€νŠΈμ˜ μ„±μ§ˆμ— μ΄ˆμ μ„ 맞좘 연ꡬ듀이 주둜 μ§„ν–‰λ˜κ³  μžˆλ‹€. μ΄λŸ¬ν•œ κ΄€μ μ—μ„œ, λ³Έ ν•™μœ„ 논문은 ν…μŠ€νŠΈ μ˜μƒ ν‰ν™œν™”λ₯Ό μœ„ν•˜μ—¬ μƒˆλ‘œμš΄ μ •λ ¬ νŠΉμ„±λ“€μ„ 닀룬닀. μ΄λŸ¬ν•œ μ •λ ¬ νŠΉμ„±λ“€μ€ λΉ„μš© ν•¨μˆ˜λ‘œ μ„€κ³„λ˜μ–΄μ§€κ³ , λΉ„μš© ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”ν•˜λŠ” 방법을 ν†΅ν•΄μ„œ ν‰ν™œν™”μ— μ‚¬μš©λ˜μ–΄μ§€λŠ” ν‰ν™œν™” λ³€μˆ˜λ“€μ΄ ꡬ해진닀. λ³Έ ν•™μœ„ 논문은 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™”, μž₯λ©΄ ν…μŠ€νŠΈ ν‰ν™œν™”, 일반 λ°°κ²½ μ†μ˜ νœ˜μ–΄μ§„ ν‘œλ©΄ ν‰ν™œν™”μ™€ 같이 3가지 μ„ΈλΆ€ 주제둜 λ‚˜λˆ μ§„λ‹€. 첫 번째둜, λ³Έ ν•™μœ„ 논문은 ν…μŠ€νŠΈ 라인듀과 μ„ λΆ„λ“€μ˜ μ •λ ¬ νŠΉμ„±μ— 기반의 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™” 방법을 μ œμ•ˆν•œλ‹€. 기쑴의 ν…μŠ€νŠΈ 라인 기반의 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™” λ°©λ²•λ“€μ˜ 경우, λ¬Έμ„œκ°€ λ³΅μž‘ν•œ λ ˆμ΄μ•„μ›ƒ ν˜•νƒœμ΄κ±°λ‚˜ 적은 수의 ν…μŠ€νŠΈ 라인을 ν¬ν•¨ν•˜κ³  μžˆμ„ λ•Œ λ¬Έμ œκ°€ λ°œμƒν•œλ‹€. μ΄λŠ” λ¬Έμ„œμ— ν…μŠ€νŠΈ λŒ€μ‹  κ·Έλ¦Ό, κ·Έλž˜ν”„ ν˜Ήμ€ ν‘œμ™€ 같은 μ˜μ—­μ΄ λ§Žμ€ κ²½μš°μ΄λ‹€. λ”°λΌμ„œ λ ˆμ΄μ•„μ›ƒμ— κ°•μΈν•œ λ¬Έμ„œ μ˜μƒ ν‰ν™œν™”λ₯Ό μœ„ν•˜μ—¬ μ œμ•ˆν•˜λŠ” 방법은 μ •λ ¬λœ ν…μŠ€νŠΈ 라인뿐만 μ•„λ‹ˆλΌ 선뢄듀도 μ΄μš©ν•œλ‹€. μ˜¬λ°”λ₯΄κ²Œ ν‰ν™œν™” 된 선뢄듀은 μ—¬μ „νžˆ μΌμ§μ„ μ˜ ν˜•νƒœμ΄κ³ , λŒ€λΆ€λΆ„ κ°€λ‘œ ν˜Ήμ€ μ„Έλ‘œ λ°©ν–₯으둜 μ •λ ¬λ˜μ–΄ μžˆλ‹€λŠ” κ°€μ • 및 관츑에 κ·Όκ±°ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 μ΄λŸ¬ν•œ μ„±μ§ˆλ“€μ„ μˆ˜μ‹ν™”ν•˜κ³  이λ₯Ό ν…μŠ€νŠΈ 라인 기반의 λΉ„μš© ν•¨μˆ˜μ™€ κ²°ν•©ν•œλ‹€. 그리고 λΉ„μš© ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™” ν•˜λŠ” 방법을 톡해, μ œμ•ˆν•˜λŠ” 방법은 μ’…μ΄μ˜ κ΅¬λΆ€λŸ¬μ§, 카메라 μ‹œμ , 초점 거리와 같은 ν‰ν™œν™” λ³€μˆ˜λ“€μ„ μΆ”μ •ν•œλ‹€. λ˜ν•œ, μ˜€κ²€μΆœλœ ν…μŠ€νŠΈ 라인듀과 μž„μ˜μ˜ λ°©ν–₯을 κ°€μ§€λŠ” μ„ λΆ„λ“€κ³Ό 같은 이상점(outlier)을 κ³ λ €ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 반볡적인 λ‹¨κ³„λ‘œ μ„€κ³„λœλ‹€. 각 λ‹¨κ³„μ—μ„œ, μ •λ ¬ νŠΉμ„±μ„ λ§Œμ‘±ν•˜μ§€ μ•ŠλŠ” 이상점듀은 제거되고, μ œκ±°λ˜μ§€ μ•Šμ€ ν…μŠ€νŠΈ 라인 및 μ„ λΆ„λ“€λ§Œμ΄ λΉ„μš©ν•¨μˆ˜ μ΅œμ ν™”μ— μ΄μš©λœλ‹€. μˆ˜ν–‰ν•œ μ‹€ν—˜ 결과듀은 μ œμ•ˆν•˜λŠ” 방법이 λ‹€μ–‘ν•œ λ ˆμ΄μ•„μ›ƒμ— λŒ€ν•˜μ—¬ 강인함을 보여쀀닀. 두 λ²ˆμ§Έλ‘œλŠ”, λ³Έ 논문은 μž₯λ©΄ ν…μŠ€νŠΈ ν‰ν™œν™” 방법을 μ œμ•ˆν•œλ‹€. κΈ°μ‘΄ μž₯λ©΄ ν…μŠ€νŠΈ ν‰ν™œν™” λ°©λ²•λ“€μ˜ 경우, κ°€λ‘œ/μ„Έλ‘œ λ°©ν–₯의 획, λŒ€μΉ­ ν˜•νƒœμ™€ 같은 λ¬Έμžκ°€ κ°€μ§€λŠ” 고유의 μƒκΉ€μƒˆμ— κ΄€λ ¨λœ νŠΉμ„±μ„ μ΄μš©ν•œλ‹€. ν•˜μ§€λ§Œ, μ΄λŸ¬ν•œ 방법듀은 λ¬Έμžλ“€μ˜ μ •λ ¬ ν˜•νƒœλŠ” κ³ λ €ν•˜μ§€ μ•Šκ³ , 각각 κ°œλ³„ λ¬Έμžμ— λŒ€ν•œ νŠΉμ„±λ“€λ§Œμ„ μ΄μš©ν•˜κΈ° λ•Œλ¬Έμ— μ—¬λŸ¬ λ¬Έμžλ“€λ‘œ κ΅¬μ„±λœ ν…μŠ€νŠΈμ— λŒ€ν•΄μ„œ 잘 μ •λ ¬λ˜μ§€ μ•Šμ€ κ²°κ³Όλ₯Ό 좜λ ₯ν•œλ‹€. μ΄λŸ¬ν•œ λ¬Έμ œμ μ„ ν•΄κ²°ν•˜κΈ° μœ„ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 λ¬Έμžλ“€μ˜ μ •λ ¬ 정보λ₯Ό μ΄μš©ν•œλ‹€. μ •ν™•ν•˜κ²ŒλŠ”, 문자 고유의 λͺ¨μ–‘λΏλ§Œ μ•„λ‹ˆλΌ μ •λ ¬ νŠΉμ„±λ“€λ„ ν•¨κ»˜ λΉ„μš©ν•¨μˆ˜λ‘œ μˆ˜μ‹ν™”λ˜κ³ , λΉ„μš©ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”ν•˜λŠ” 방법을 ν†΅ν•΄μ„œ ν‰ν™œν™”κ°€ μ§„ν–‰λœλ‹€. λ˜ν•œ, λ¬Έμžλ“€μ˜ μ •λ ¬ νŠΉμ„±μ„ μˆ˜μ‹ν™”ν•˜κΈ° μœ„ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 ν…μŠ€νŠΈλ₯Ό 각각 κ°œλ³„ λ¬Έμžλ“€λ‘œ λΆ„λ¦¬ν•˜λŠ” 문자 뢄리 λ˜ν•œ μˆ˜ν–‰ν•œλ‹€. κ·Έ λ’€, ν…μŠ€νŠΈμ˜ μœ„, μ•„λž˜ 선듀을 RANSAC μ•Œκ³ λ¦¬μ¦˜μ„ μ΄μš©ν•œ μ΅œμ†Œ μ œκ³±λ²•μ„ 톡해 μΆ”μ •ν•œλ‹€. 즉, 전체 μ•Œκ³ λ¦¬μ¦˜μ€ 문자 뢄리와 μ„  μΆ”μ •, ν‰ν™œν™”κ°€ 반볡적으둜 μˆ˜ν–‰λœλ‹€. μ œμ•ˆν•˜λŠ” λΉ„μš©ν•¨μˆ˜λŠ” 볼둝(convex)ν˜•νƒœκ°€ μ•„λ‹ˆκ³  λ˜ν•œ λ§Žμ€ λ³€μˆ˜λ“€μ„ ν¬ν•¨ν•˜κ³  있기 λ•Œλ¬Έμ—, 이λ₯Ό μ΅œμ ν™”ν•˜κΈ° μœ„ν•˜μ—¬ Augmented Lagrange Multiplier 방법을 μ΄μš©ν•œλ‹€. μ œμ•ˆν•˜λŠ” 방법은 일반 촬영 μ˜μƒκ³Ό ν•©μ„±λœ ν…μŠ€νŠΈ μ˜μƒμ„ 톡해 μ‹€ν—˜μ΄ μ§„ν–‰λ˜μ—ˆκ³ , μ‹€ν—˜ 결과듀은 μ œμ•ˆν•˜λŠ” 방법이 κΈ°μ‘΄ 방법듀에 λΉ„ν•˜μ—¬ 높은 인식 μ„±λŠ₯을 λ³΄μ΄λ©΄μ„œ λ™μ‹œμ— μ‹œκ°μ μœΌλ‘œλ„ 쒋은 κ²°κ³Όλ₯Ό λ³΄μž„μ„ 보여쀀닀. λ§ˆμ§€λ§‰μœΌλ‘œ, μ œμ•ˆν•˜λŠ” 방법은 일반 λ°°κ²½ μ†μ˜ νœ˜μ–΄μ§„ ν‘œλ©΄ ν‰ν™œν™” λ°©λ²•μœΌλ‘œλ„ ν™•μž₯λœλ‹€. 일반 배경에 λŒ€ν•΄μ„œ, μ•½λ³‘μ΄λ‚˜ 음료수 μΊ”κ³Ό 같이 원톡 ν˜•νƒœμ˜ λ¬Όμ²΄λŠ” 많이 μ‘΄μž¬ν•œλ‹€. κ·Έλ“€μ˜ ν‘œλ©΄μ€ 일반 원톡 ν‘œλ©΄(GCS)으둜 λͺ¨λΈλ§μ΄ κ°€λŠ₯ν•˜λ‹€. μ΄λŸ¬ν•œ νœ˜μ–΄μ§„ ν‘œλ©΄λ“€μ€ λ§Žμ€ λ¬Έμžμ™€ 그림듀을 ν¬ν•¨ν•˜κ³  μžˆμ§€λ§Œ, ν¬ν•¨λœ λ¬ΈμžλŠ” λ¬Έμ„œμ— λΉ„ν•΄μ„œ 맀우 λΆˆκ·œμΉ™μ μΈ ꡬ쑰λ₯Ό 가지고 μžˆλ‹€. λ”°λΌμ„œ 기쑴의 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™” λ°©λ²•λ“€λ‘œλŠ” 일반 λ°°κ²½ 속 νœ˜μ–΄μ§„ ν‘œλ©΄ μ˜μƒμ„ ν‰ν™œν™”ν•˜κΈ° νž˜λ“€λ‹€. λ§Žμ€ νœ˜μ–΄μ§„ ν‘œλ©΄μ€ 잘 μ •λ ¬λœ μ„ λΆ„λ“€ (ν…Œλ‘λ¦¬ μ„  ν˜Ήμ€ λ°”μ½”λ“œ)을 ν¬ν•¨ν•˜κ³  μžˆλ‹€λŠ” 관츑에 κ·Όκ±°ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 μ•žμ„œ μ œμ•ˆν•œ 선뢄듀에 λŒ€ν•œ ν•¨μˆ˜λ₯Ό μ΄μš©ν•˜μ—¬ νœ˜μ–΄μ§„ ν‘œλ©΄μ„ ν‰ν™œν™”ν•œλ‹€. λ‹€μ–‘ν•œ λ‘₯κ·Ό 물체의 νœ˜μ–΄μ§„ ν‘œλ©΄ μ˜μƒλ“€μ— λŒ€ν•œ μ‹€ν—˜ 결과듀은 μ œμ•ˆν•˜λŠ” 방법이 ν‰ν™œν™”λ₯Ό μ •ν™•ν•˜κ²Œ μˆ˜ν–‰ν•¨μ„ 보여쀀닀.The optical character recognition (OCR) of text images captured by cameras plays an important role for scene understanding. However, the OCR of camera-captured image is still considered a challenging problem, even after the text detection (localization). It is mainly due to the geometric distortions caused by page curve and perspective view, therefore their rectification has been an essential pre-processing step for their recognition. Thus, there have been many text image rectification methods which recover the fronto-parallel view image from a single distorted image. Recently, many researchers have focused on the properties of the well-rectified text. In this respect, this dissertation presents novel alignment properties for text image rectification, which are encoded into the proposed cost functions. By minimizing the cost functions, the transformation parameters for rectification are obtained. In detail, they are applied to three topics: document image dewarping, scene text rectification, and curved surface dewarping in real scene. First, a document image dewarping method is proposed based on the alignments of text-lines and line segments. Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, the proposed method uses line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, the proposed method encodes this properties into the cost function in addition to the text-line based cost. By minimizing the function, the proposed method can obtain transformation parameters for page curve, camera pose, and focal length, which are used for document image rectification. Considering that there are many outliers in line segment directions and miss-detected text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, the proposed method removes the text-lines and line segments that are not well aligned, and then minimizes the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. This dissertation also presents a method for scene text rectification. Conventional methods for scene text rectification mainly exploited the glyph property, which means that the characters in many language have horizontal/vertical strokes and also some symmetric shapes. However, since they consider the only shape properties of individual character, without considering the alignments of characters, they work well for only images with a single character, and still yield mis-aligned results for images with multiple characters. In order to alleviate this problem, the proposed method explicitly imposes alignment constraints on rectified results. To be precise, character alignments as well as glyph properties are encoded in the proposed cost function, and the transformation parameters are obtained by minimizing the function. Also, in order to encode the alignments of characters into the cost function, the proposed method separates the text into individual characters using a projection profile method before optimizing the cost function. Then, top and bottom lines are estimated using a least squares line fitting with RANSAC. Overall algorithm is designed to perform character segmentation, line fitting, and rectification iteratively. Since the cost function is non-convex and many variables are involved in the function, the proposed method also develops an optimization method using Augmented Lagrange Multiplier method. This dissertation evaluates the proposed method on real and synthetic text images and experimental results show that the proposed method achieves higher OCR accuracy than the conventional approach and also yields visually pleasing results. Finally, the proposed method can be extended to the curved surface dewarping in real scene. In real scene, there are many circular objects such as medicine bottles or cans of drinking water, and their curved surfaces can be modeled as Generalized Cylindrical Surfaces (GCS). These curved surfaces include many significant text and figures, however their text has irregular structure compared to documents. Therefore, the conventional dewarping methods based on the properties of well-rectified text have problems in their rectification. Based on the observation that many curved surfaces include well-aligned line segments (boundary lines of objects or barcode), the proposed method rectifies the curved surfaces by exploiting the proposed line segment terms. Experimental results on a range of images with curved surfaces of circular objects show that the proposed method performs rectification robustly.1 Introduction 1 1.1 Document image dewarping 3 1.2 Scene text rectification 5 1.3 Curved surface dewarping in real scene 7 1.4 Contents 8 2 Related work 9 2.1 Document image dewarping 9 2.1.1 Dewarping methods using additional information 9 2.1.2 Text-line based dewarping methods 10 2.2 Scene text rectification 11 2.3 Curved surface dewarping in real scene 12 3 Document image dewarping 15 3.1 Proposed cost function 15 3.1.1 Parametric model of dewarping process 15 3.1.2 Cost function design 18 3.1.3 Line segment properties and cost function 19 3.2 Outlier removal and optimization 26 3.2.1 Jacobian matrix of the proposed cost function 27 3.3 Document region detection and dewarping 31 3.4 Experimental results 32 3.4.1 Experimental results on text-abundant document images 33 3.4.2 Experimental results on non conventional document images 34 3.5 Summary 47 4 Scene text rectification 49 4.1 Proposed cost function for rectification 49 4.1.1 Cost function design 49 4.1.2 Character alignment properties and alignment terms 51 4.2 Overall algorithm 54 4.2.1 Initialization 55 4.2.2 Character segmentation 56 4.2.3 Estimation of the alignment parameters 57 4.2.4 Cost function optimization for rectification 58 4.3 Experimental results 63 4.4 Summary 66 5 Curved surface dewarping in real scene 73 5.1 Proposed curved surface dewarping method 73 5.1.1 Pre-processing 73 5.1 Experimental results 74 5.2 Summary 76 6 Conclusions 83 Bibliography 85 Abstract (Korean) 93Docto

    A Book Reader Design for Persons with Visual Impairment and Blindness

    Get PDF
    The objective of this dissertation is to provide a new design approach to a fully automated book reader for individuals with visual impairment and blindness that is portable and cost effective. This approach relies on the geometry of the design setup and provides the mathematical foundation for integrating, in a unique way, a 3-D space surface map from a low-resolution time of flight (ToF) device with a high-resolution image as means to enhance the reading accuracy of warped images due to the page curvature of bound books and other magazines. The merits of this low cost, but effective automated book reader design include: (1) a seamless registration process of the two imaging modalities so that the low resolution (160 x 120 pixels) height map, acquired by an Argos3D-P100 camera, accurately covers the entire book spread as captured by the high resolution image (3072 x 2304 pixels) of a Canon G6 Camera; (2) a mathematical framework for overcoming the difficulties associated with the curvature of open bound books, a process referred to as the dewarping of the book spread images, and (3) image correction performance comparison between uniform and full height map to determine which map provides the highest Optical Character Recognition (OCR) reading accuracy possible. The design concept could also be applied to address the challenging process of book digitization. This method is dependent on the geometry of the book reader setup for acquiring a 3-D map that yields high reading accuracy once appropriately fused with the high-resolution image. The experiments were performed on a dataset consisting of 200 pages with their corresponding computed and co-registered height maps, which are made available to the research community (cate-book3dmaps.fiu.edu). Improvements to the characters reading accuracy, due to the correction steps, were quantified and measured by introducing the corrected images to an OCR engine and tabulating the number of miss-recognized characters. Furthermore, the resilience of the book reader was tested by introducing a rotational misalignment to the book spreads and comparing the OCR accuracy to those obtained with the standard alignment. The standard alignment yielded an average reading accuracy of 95.55% with the uniform height map (i.e., the height values of the central row of the 3-D map are replicated to approximate all other rows), and 96.11% with the full height maps (i.e., each row has its own height values as obtained from the 3D camera). When the rotational misalignments were taken into account, the results obtained produced average accuracies of 90.63% and 94.75% for the same respective height maps, proving added resilience of the full height map method to potential misalignments

    ν…μŠ€νŠΈμ™€ νŠΉμ§•μ  기반의 λͺ©μ ν•¨μˆ˜ μ΅œμ ν™”λ₯Ό μ΄μš©ν•œ λ¬Έμ„œμ™€ ν…μŠ€νŠΈ ν‰ν™œν™” 기법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : 전기·컴퓨터곡학뢀, 2014. 8. 쑰남읡.There are many techniques and applications that detect and recognize text information in the images, e.g., document retrieval using the camera-captured document image, book reader for visually impaired, and augmented reality based on text recognition. In these applications, the planar surfaces which contain the text are often distorted in the captured image due to the perspective view (e.g., road signs), curvature (e.g., unfolded books), and wrinkles (e.g., old documents). Specifically, recovering the original document texture by removing these distortions from the camera-captured document images is called the document rectification. In this dissertation, new text surface rectification algorithms are proposed, for improving text recognition accuracy and visual quality. The proposed methods are categorized into 3 types depending on the types of the input. The contributions of the proposed methods can be summarized as follows. In the first rectification algorithm, the dense text-lines in the documents are employed to rectify the images. Unlike the conventional approaches, the proposed method does not directly use the text-line. Instead, the proposed method use the discrete representation of text-lines and text-blocks which are the sets of connected components. Also, the geometric distortion caused by page curl and perspective view are modeled as generalized cylindrical surfaces and camera rotation respectively. With these distortion model and discrete representation of the features, a cost function whose minimization yields parameters of the distortion model is developed. In the cost function, the properties of the pages such as text-block alignment, line-spacing, and the straightness of text-lines are encoded. By describing the text features using the sets of discrete points, the cost function can be easily defined and well solved by Levenberg-Marquadt algorithm. Experiments show that the proposed method works well for the various layouts and curved surfaces, and compares favorably with the conventional methods on the standard dataset. The second algorithm is a unified framework to rectify and stitch multiple document images using visual feature points instead of text lines. This is similar to the method employed in general image stitching algorithm. However, the general image stitching algorithm usually assumes fixed center of camera, which is not taken for granted in capturing the document. To deal with the camera motion between images, a new parametric family of motion model is proposed in this dissertation. Besides, to remove the ambiguity in the reference plane, a new cost function is developed to impose the constraints on the reference plane. This enables the estimation of physically correct reference plane without prior knowledge. The estimated reference plane can also be used to rectify the stitching result. Furthermore, the proposed method can be applied to any other planar object such as building facades or mural paintings as well as the camera-captured document image since it employs the general features. The third rectification method is based on scene text detection algorithm, which is independent from the language model. The conventional methods assume that a character consists of a single connected component (CC) like English alphabet. However, this assumption is brittle in the Asian characters such as Korean, Chinese, and Japanese, where a single character consists of several CCs. Therefore, it is difficult to divide CCs into text lines without language model. To alleviate this problem, the proposed method clusters the candidate regions based on the similarity measure considering inter-character relation. The adjacency measure is trained on the data set labeled with the bounding box of text region. Non-text regions that remain after clustering are filtered out in text/non-text classification step. Final text regions are merged or divided into each text line considering the orientation and location. The detected text is rectified using the orientation of text-line and vertical strokes. The proposed method outperforms state-of-the-art algorithms in English as well as Asian characters in the extensive experiments.1 Introduction 1 1.1 Document rectification via text-line based optimization . . . . . . . 2 1.2 A unified approach of rectification and stitching for document images 4 1.3 Rectification via scene text detection . . . . . . . . . . . . . . . . . . 5 1.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Related work 9 2.1 Document rectification . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Document dewarping without text-lines . . . . . . . . . . . . 9 2.1.2 Document dewarping with text-lines . . . . . . . . . . . . . . 10 2.1.3 Text-block identification and text-line extraction . . . . . . . 11 2.2 Document stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Scene text detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Document rectification based on text-lines 15 3.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Image acquisition model . . . . . . . . . . . . . . . . . . . . . 16 3.1.2 Proposed approach to document dewarping . . . . . . . . . . 18 3.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 22 3.2.1 Design of Estr(Β·) . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Minimization of Estr(Β·) . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Alignment type classification . . . . . . . . . . . . . . . . . . 28 3.2.4 Design of Ealign(Β·) . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.5 Design of Espacing(Β·) . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Extension to unfolded book surfaces . . . . . . . . . . . . . . . . . . 32 3.4 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.1 Experiments on synthetic data . . . . . . . . . . . . . . . . . 36 3.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 39 3.4.3 Comparison with existing methods . . . . . . . . . . . . . . . 43 3.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Document rectification based on feature detection 49 4.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 51 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Homography between the i-th image and E . . . . . . . . . 52 4.2.3 Proposed cost function . . . . . . . . . . . . . . . . . . . . . . 53 4.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.5 Relation to the model in [17] . . . . . . . . . . . . . . . . . . 55 4.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 Classification of two cases . . . . . . . . . . . . . . . . . . . . 56 4.3.2 Skew removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.1 Quantitative evaluation on metric reconstruction performance 57 4.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 58 5 Scene text detection and rectification 67 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.2 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Candidate region detection . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.1 CC extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.2 Computation of similarity between CCs . . . . . . . . . . . . 70 5.2.3 CC clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3 Rectification of candidate region . . . . . . . . . . . . . . . . . . . . 73 5.4 Text/non-text classification . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.1 Experimental results on ICDAR 2011 dataset . . . . . . . . . 80 5.5.2 Experimental results on the Asian character dataset . . . . . 80 6 Conclusion 83 Bibliography 87 Abstract (Korean) 97Docto
    corecore