10 research outputs found

    Assembling convolution neural networks for automatic viewing transformation

    Get PDF
    Images taken under different camera poses are rotated or distorted, which leads to poor perception experiences. This paper proposes a new framework to automatically transform the images to the conformable view setting by assembling different convolution neural networks. Specifically, a referential 3D ground plane is firstly derived from the RGB image and a novel projection mapping algorithm is developed to achieve automatic viewing transformation. Extensive experimental results demonstrate that the proposed method outperforms the state-ofthe-art vanishing points based methods by a large margin in terms of accuracy and robustness

    Video Upright Adjustment and Stabilization

    Get PDF
    Upright adjustment, Video stabilization, Camera pathWe propose a novel video upright adjustment method that can reliably correct slanted video contents that are often found in casual videos. Our approach combines deep learning and Bayesian inference to estimate accurate rotation angles from video frames. We train a convolutional neural network to obtain initial estimates of the rotation angles of input video frames. The initial estimates from the network are temporally inconsistent and inaccurate. To resolve this, we use Bayesian inference. We analyze estimation errors of the network, and derive an error model. We then use the error model to formulate video upright adjustment as a maximum a posteriori problem where we estimate consistent rotation angles from the initial estimates, while respecting relative rotations between consecutive frames. Finally, we propose a joint approach to video stabilization and upright adjustment, which minimizes information loss caused by separately handling stabilization and upright adjustment. Experimental results show that our video upright adjustment method can effectively correct slanted video contents, and its combination with video stabilization can achieve visually pleasing results from shaky and slanted videos.openI. INTRODUCTION 1.1. Related work II. ROTATION ESTIMATION NETWORK III. ERROR ANALYSIS IV. VIDEO UPRIGHT ADJUSTMENT 4.1. Initial angle estimation 4.2. Robust angle estimation 4.3. Optimization 4.4. Warping V. JOINT UPRIGHT ADJUSTMENT AND STABILIZATION 5.1. Bundled camera paths for video stabilization 5.2. Joint approach VI. EXPERIMENTS VII. CONCLUSION ReferencesCNN)์„ ํ›ˆ๋ จ์‹œํ‚จ๋‹ค. ์‹ ๊ฒฝ๋ง์˜ ์ดˆ๊ธฐ ์ถ”์ •์น˜๋Š” ์™„์ „ํžˆ ์ •ํ™•ํ•˜์ง€ ์•Š์œผ๋ฉฐ ์‹œ๊ฐ„์ ์œผ๋กœ๋„ ์ผ๊ด€๋˜์ง€ ์•Š๋Š”๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฒ ์ด์ง€์•ˆ ์ธํผ๋Ÿฐ์Šค๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์‹ ๊ฒฝ๋ง์˜ ์ถ”์ • ์˜ค๋ฅ˜๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์˜ค๋ฅ˜ ๋ชจ๋ธ์„ ๋„์ถœํ•œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์˜ค๋ฅ˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฐ์† ํ”„๋ ˆ์ž„ ๊ฐ„์˜ ์ƒ๋Œ€ ํšŒ์ „ ๊ฐ๋„(Relative rotation angle)๋ฅผ ๋ฐ˜์˜ํ•˜๋ฉด์„œ ์ดˆ๊ธฐ ์ถ”์ •์น˜๋กœ๋ถ€ํ„ฐ ์‹œ๊ฐ„์ ์œผ๋กœ ์ผ๊ด€๋œ ํšŒ์ „ ๊ฐ๋„๋ฅผ ์ถ”์ •ํ•˜๋Š” ์ตœ๋Œ€ ์‚ฌํ›„ ๋ฌธ์ œ(Maximum a posteriori problem)๋กœ ๋™์˜์ƒ ์ˆ˜ํ‰ ๋ณด์ •์„ ๊ณต์‹ํ™”ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋™์˜์ƒ ์ˆ˜ํ‰ ๋ณด์ • ๋ฐ ๋™์˜์ƒ ์•ˆ์ •ํ™”(Video stabilization)์— ๋Œ€ํ•œ ๋™์‹œ ์ ‘๊ทผ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ ์ˆ˜ํ‰ ๋ณด์ •๊ณผ ์•ˆ์ •ํ™”๋ฅผ ๋ณ„๋„๋กœ ์ˆ˜ํ–‰ํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๊ณต๊ฐ„ ์ •๋ณด ์†์‹ค๊ณผ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๋ฉฐ ์•ˆ์ •ํ™”์˜ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด ๋™์˜์ƒ ์ˆ˜ํ‰ ๋ณด์ •์œผ๋กœ ๊ธฐ์šธ์–ด์ง„ ๋™์˜์ƒ์„ ํšจ๊ณผ์ ์œผ๋กœ ๋ณด์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋™์˜์ƒ ์•ˆ์ •ํ™” ๋ฐฉ๋ฒ•๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ํ”๋“ค๋ฆฌ๊ณ  ๊ธฐ์šธ์–ด์ง„ ๋™์˜์ƒ์œผ๋กœ๋ถ€ํ„ฐ ์‹œ๊ฐ์ ์œผ๋กœ ๋งŒ์กฑ์Šค๋Ÿฌ์šด ์ƒˆ๋กœ์šด ๋™์˜์ƒ์„ ํš๋“ํ•  ์ˆ˜ ์žˆ๋‹ค.๋ณธ ๋…ผ๋ฌธ์€ ์ผ๋ฐ˜์ธ๋“ค์ด ์ดฌ์˜ํ•œ ๋™์˜์ƒ์—์„œ ํ”ํžˆ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ธ ๊ธฐ์šธ์–ด์ง์„ ์ œ๊ฑฐํ•˜์—ฌ ์ˆ˜ํ‰์ด ์˜ฌ๋ฐ”๋ฅธ ๋™์˜์ƒ์„ ํš๋“ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” ๋™์˜์ƒ ์ˆ˜ํ‰ ๋ณด์ •(Video upright adjustment) ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋”ฅ ๋Ÿฌ๋‹(Deep learning)๊ณผ ๋ฒ ์ด์ง€์•ˆ ์ธํผ๋Ÿฐ์Šค(Bayesian inference)๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ๋™์˜์ƒ ํ”„๋ ˆ์ž„(Frame)์—์„œ ์ •ํ™•ํ•œ ๊ฐ๋„๋ฅผ ์ถ”์ •ํ•œ๋‹ค. ๋จผ์ € ์ž…๋ ฅ ๋™์˜์ƒ ํ”„๋ ˆ์ž„์˜ ํšŒ์ „ ๊ฐ๋„์˜ ์ดˆ๊ธฐ ์ถ”์ •์น˜๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ํšŒ์„  ์‹ ๊ฒฝ๋ง(Convolutional neural networkMasterdCollectio

    Joint Rectification and Stitching of Images Formulated as Camera Pose Estimation Problems

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 8. ์กฐ๋‚จ์ต.This dissertation presents a study of image rectification and stitching problems formulated as camera pose estimation problems. There have been many approaches to the rectification and/or stitching of images for their importance in image processing and computer vision areas. This dissertation adds a new approach to these problems, which finds appropriate optimization problems whose solutions give camera pose parameters for the given problems. Specifically, the contribution of this dissertation is to develop (i) a new optimization problem that can handle image rectification and stitching in a unified framework through the pose estimation formulation, and (ii) a new approach to planar object rectification problem which is also formulated as an optimal homography estimation problem. First, a unified framework for the image rectification and stitching problem is studied, which can handle both assumptions or conditions that (i) the optical center of camera is fixed or (ii) the camera captures a plane target. For this, the camera pose is modeled with six parameters (three for the rotation and three for the translation) and a cost function is developed that reflects the registration errors on a reference plane (image stitching results). The designed cost function is effectively minimized via the Levenberg-Marquardt algorithm. From the estimated camera poses, the relative camera motion is computed: when the optical center is moved (i.e., the camera motion is large), metric rectification is possible and thus provides rectified composites as well as camera poses are obtained. Second, this dissertation presents a rectification method for planar objects using line segments which can be augmented to the previous problem for further rectification or performed independently to single images when there are planar objects in the image such as building facades or name cards. Based on the 2D Manhattan world assumption (i.e., the majority of line segments are aligned with principal axes), a cost function is formulated as an optimal homography estimation problem that makes the line segments horizontally or vertically straight. Since there are outliers in the line segment detection, an iterative optimization scheme for the robust estimation is also developed. The application of the proposed methods is the stitching of many images of the same scene into a high resolution image along with its rectification. Also it can be applied to the rectification of building facades, documents, name cards, etc, which helps the optical character recognition (OCR) rates of texts in the scene and also to improve the recognition of buildings and visual qualities of scenery images. In addition, this dissertation finally presents an application of the proposed method for finding boundaries of document in videos for mobile device based application. This is a challenging problem due to perspective distortion, focus and motion blur, partial occlusion, and so on. For this, a cost function is formulated which comprises a data term (color distributions of the document and background), boundary term (alignment and contrast errors after the contour of the documents is rectified), and temporal term (temporal coherence in consecutive frames).1 Introduction 1 1.1 Background 1 1.2 Contributions 2 1.3 Homography between the i-th image and pi_E 4 1.4 Structure of the dissertation 5 2 A unified framework for automatic image stitching and rectification 7 2.1 Related works 7 2.2 Proposed cost function and its optimization 8 2.2.1 Proposed cost function 12 2.2.2 Optimization 13 2.2.3 Relation to the model in [1] 14 2.3 Post-processing 15 2.3.1 Classification of the conditions 15 2.3.2 Skew removal 16 2.4 Experimental results 18 2.4.1 Quantitative evaluation on metric reconstruction performance 19 2.4.2 Determining the capturing environment 21 2.4.3 Experiments on real images 25 2.4.4 Applications to document image stitching and more results 28 2.5 Summary 28 3 Rectification of planar targets based on line segments 31 3.1 Related works 31 3.1.1 Rectification of planar objects 32 3.1.2 Rectification based on self calibration 33 3.2 Proposed rectification model 33 3.2.1 Optimization-based framework 36 3.2.2 Cost function based on line segment alignments 37 3.2.3 Optimization 38 3.3 Experimental results 40 3.3.1 Evaluation metrics 40 3.3.2 Quantitative evaluation 41 3.3.3 Computation complexity 45 3.3.4 Qualitative comparisons and limitations 45 3.4 Summary 52 4 Application: Document capture system for mobile devices 53 4.1 Related works 53 4.2 The proposed method 54 4.2.1 Notation 54 4.2.2 Optimization-based framework 55 4.3 Experimental results 62 4.3.1 Initialization 65 4.3.2 Quantitative evaluation 65 4.3.3 Qualitative evaluation and limitations 66 4.4 Summary 67 5 Conclusions and future works 75 Bibliography 77 Abstract (Korean) 83Docto

    ์ •๋ ฌ ํŠน์„ฑ๋“ค ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ๋ฐ ์žฅ๋ฉด ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ์กฐ๋‚จ์ต.์นด๋ฉ”๋ผ๋กœ ์ดฌ์˜ํ•œ ํ…์ŠคํŠธ ์˜์ƒ์— ๋Œ€ํ•ด์„œ, ๊ด‘ํ•™ ๋ฌธ์ž ์ธ์‹(OCR)์€ ์ดฌ์˜๋œ ์žฅ๋ฉด์„ ๋ถ„์„ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ์˜ฌ๋ฐ”๋ฅธ ํ…์ŠคํŠธ ์˜์—ญ ๊ฒ€์ถœ ํ›„์—๋„, ์ดฌ์˜ํ•œ ์˜์ƒ์— ๋Œ€ํ•œ ๋ฌธ์ž ์ธ์‹์€ ์—ฌ์ „ํžˆ ์–ด๋ ค์šด ๋ฌธ์ œ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค. ์ด๋Š” ์ข…์ด์˜ ๊ตฌ๋ถ€๋Ÿฌ์ง๊ณผ ์นด๋ฉ”๋ผ ์‹œ์ ์— ์˜ํ•œ ๊ธฐํ•˜ํ•™์ ์ธ ์™œ๊ณก ๋•Œ๋ฌธ์ด๊ณ , ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ํ…์ŠคํŠธ ์˜์ƒ์— ๋Œ€ํ•œ ํ‰ํ™œํ™”๋Š” ๋ฌธ์ž ์ธ์‹์— ์žˆ์–ด์„œ ํ•„์ˆ˜์ ์ธ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์œผ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค. ์ด๋ฅผ ์œ„ํ•œ ์™œ๊ณก๋œ ์ดฌ์˜ ์˜์ƒ์„ ์ •๋ฉด ์‹œ์ ์œผ๋กœ ๋ณต์›ํ•˜๋Š” ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์€ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š”, ํ‰ํ™œํ™”๊ฐ€ ์ž˜ ๋œ ํ…์ŠคํŠธ์˜ ์„ฑ์งˆ์— ์ดˆ์ ์„ ๋งž์ถ˜ ์—ฐ๊ตฌ๋“ค์ด ์ฃผ๋กœ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ด€์ ์—์„œ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™”๋ฅผ ์œ„ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ •๋ ฌ ํŠน์„ฑ๋“ค์„ ๋‹ค๋ฃฌ๋‹ค. ์ด๋Ÿฌํ•œ ์ •๋ ฌ ํŠน์„ฑ๋“ค์€ ๋น„์šฉ ํ•จ์ˆ˜๋กœ ์„ค๊ณ„๋˜์–ด์ง€๊ณ , ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ํ‰ํ™œํ™”์— ์‚ฌ์šฉ๋˜์–ด์ง€๋Š” ํ‰ํ™œํ™” ๋ณ€์ˆ˜๋“ค์ด ๊ตฌํ•ด์ง„๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™”, ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™”, ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์†์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ํ‰ํ™œํ™”์™€ ๊ฐ™์ด 3๊ฐ€์ง€ ์„ธ๋ถ€ ์ฃผ์ œ๋กœ ๋‚˜๋ˆ ์ง„๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ๋ผ์ธ๋“ค๊ณผ ์„ ๋ถ„๋“ค์˜ ์ •๋ ฌ ํŠน์„ฑ์— ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด์˜ ํ…์ŠคํŠธ ๋ผ์ธ ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์˜ ๊ฒฝ์šฐ, ๋ฌธ์„œ๊ฐ€ ๋ณต์žกํ•œ ๋ ˆ์ด์•„์›ƒ ํ˜•ํƒœ์ด๊ฑฐ๋‚˜ ์ ์€ ์ˆ˜์˜ ํ…์ŠคํŠธ ๋ผ์ธ์„ ํฌํ•จํ•˜๊ณ  ์žˆ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋Š” ๋ฌธ์„œ์— ํ…์ŠคํŠธ ๋Œ€์‹  ๊ทธ๋ฆผ, ๊ทธ๋ž˜ํ”„ ํ˜น์€ ํ‘œ์™€ ๊ฐ™์€ ์˜์—ญ์ด ๋งŽ์€ ๊ฒฝ์šฐ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋ ˆ์ด์•„์›ƒ์— ๊ฐ•์ธํ•œ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™”๋ฅผ ์œ„ํ•˜์—ฌ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ •๋ ฌ๋œ ํ…์ŠคํŠธ ๋ผ์ธ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์„ ๋ถ„๋“ค๋„ ์ด์šฉํ•œ๋‹ค. ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ‰ํ™œํ™” ๋œ ์„ ๋ถ„๋“ค์€ ์—ฌ์ „ํžˆ ์ผ์ง์„ ์˜ ํ˜•ํƒœ์ด๊ณ , ๋Œ€๋ถ€๋ถ„ ๊ฐ€๋กœ ํ˜น์€ ์„ธ๋กœ ๋ฐฉํ–ฅ์œผ๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ๋‹ค๋Š” ๊ฐ€์ • ๋ฐ ๊ด€์ธก์— ๊ทผ๊ฑฐํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ด๋Ÿฌํ•œ ์„ฑ์งˆ๋“ค์„ ์ˆ˜์‹ํ™”ํ•˜๊ณ  ์ด๋ฅผ ํ…์ŠคํŠธ ๋ผ์ธ ๊ธฐ๋ฐ˜์˜ ๋น„์šฉ ํ•จ์ˆ˜์™€ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ข…์ด์˜ ๊ตฌ๋ถ€๋Ÿฌ์ง, ์นด๋ฉ”๋ผ ์‹œ์ , ์ดˆ์  ๊ฑฐ๋ฆฌ์™€ ๊ฐ™์€ ํ‰ํ™œํ™” ๋ณ€์ˆ˜๋“ค์„ ์ถ”์ •ํ•œ๋‹ค. ๋˜ํ•œ, ์˜ค๊ฒ€์ถœ๋œ ํ…์ŠคํŠธ ๋ผ์ธ๋“ค๊ณผ ์ž„์˜์˜ ๋ฐฉํ–ฅ์„ ๊ฐ€์ง€๋Š” ์„ ๋ถ„๋“ค๊ณผ ๊ฐ™์€ ์ด์ƒ์ (outlier)์„ ๊ณ ๋ คํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฐ˜๋ณต์ ์ธ ๋‹จ๊ณ„๋กœ ์„ค๊ณ„๋œ๋‹ค. ๊ฐ ๋‹จ๊ณ„์—์„œ, ์ •๋ ฌ ํŠน์„ฑ์„ ๋งŒ์กฑํ•˜์ง€ ์•Š๋Š” ์ด์ƒ์ ๋“ค์€ ์ œ๊ฑฐ๋˜๊ณ , ์ œ๊ฑฐ๋˜์ง€ ์•Š์€ ํ…์ŠคํŠธ ๋ผ์ธ ๋ฐ ์„ ๋ถ„๋“ค๋งŒ์ด ๋น„์šฉํ•จ์ˆ˜ ์ตœ์ ํ™”์— ์ด์šฉ๋œ๋‹ค. ์ˆ˜ํ–‰ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•œ ๋ ˆ์ด์•„์›ƒ์— ๋Œ€ํ•˜์—ฌ ๊ฐ•์ธํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ๋Š”, ๋ณธ ๋…ผ๋ฌธ์€ ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์˜ ๊ฒฝ์šฐ, ๊ฐ€๋กœ/์„ธ๋กœ ๋ฐฉํ–ฅ์˜ ํš, ๋Œ€์นญ ํ˜•ํƒœ์™€ ๊ฐ™์€ ๋ฌธ์ž๊ฐ€ ๊ฐ€์ง€๋Š” ๊ณ ์œ ์˜ ์ƒ๊น€์ƒˆ์— ๊ด€๋ จ๋œ ํŠน์„ฑ์„ ์ด์šฉํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ํ˜•ํƒœ๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ , ๊ฐ๊ฐ ๊ฐœ๋ณ„ ๋ฌธ์ž์— ๋Œ€ํ•œ ํŠน์„ฑ๋“ค๋งŒ์„ ์ด์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ ๋ฌธ์ž๋“ค๋กœ ๊ตฌ์„ฑ๋œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด์„œ ์ž˜ ์ •๋ ฌ๋˜์ง€ ์•Š์€ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ๋‹ค. ์ •ํ™•ํ•˜๊ฒŒ๋Š”, ๋ฌธ์ž ๊ณ ์œ ์˜ ๋ชจ์–‘๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ •๋ ฌ ํŠน์„ฑ๋“ค๋„ ํ•จ๊ป˜ ๋น„์šฉํ•จ์ˆ˜๋กœ ์ˆ˜์‹ํ™”๋˜๊ณ , ๋น„์šฉํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ํ‰ํ™œํ™”๊ฐ€ ์ง„ํ–‰๋œ๋‹ค. ๋˜ํ•œ, ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ํŠน์„ฑ์„ ์ˆ˜์‹ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํ…์ŠคํŠธ๋ฅผ ๊ฐ๊ฐ ๊ฐœ๋ณ„ ๋ฌธ์ž๋“ค๋กœ ๋ถ„๋ฆฌํ•˜๋Š” ๋ฌธ์ž ๋ถ„๋ฆฌ ๋˜ํ•œ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๊ทธ ๋’ค, ํ…์ŠคํŠธ์˜ ์œ„, ์•„๋ž˜ ์„ ๋“ค์„ RANSAC ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•œ ์ตœ์†Œ ์ œ๊ณฑ๋ฒ•์„ ํ†ตํ•ด ์ถ”์ •ํ•œ๋‹ค. ์ฆ‰, ์ „์ฒด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž ๋ถ„๋ฆฌ์™€ ์„  ์ถ”์ •, ํ‰ํ™œํ™”๊ฐ€ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋น„์šฉํ•จ์ˆ˜๋Š” ๋ณผ๋ก(convex)ํ˜•ํƒœ๊ฐ€ ์•„๋‹ˆ๊ณ  ๋˜ํ•œ ๋งŽ์€ ๋ณ€์ˆ˜๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Augmented Lagrange Multiplier ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜ ์ดฌ์˜ ์˜์ƒ๊ณผ ํ•ฉ์„ฑ๋œ ํ…์ŠคํŠธ ์˜์ƒ์„ ํ†ตํ•ด ์‹คํ—˜์ด ์ง„ํ–‰๋˜์—ˆ๊ณ , ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์— ๋น„ํ•˜์—ฌ ๋†’์€ ์ธ์‹ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉด์„œ ๋™์‹œ์— ์‹œ๊ฐ์ ์œผ๋กœ๋„ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์†์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์œผ๋กœ๋„ ํ™•์žฅ๋œ๋‹ค. ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ์— ๋Œ€ํ•ด์„œ, ์•ฝ๋ณ‘์ด๋‚˜ ์Œ๋ฃŒ์ˆ˜ ์บ”๊ณผ ๊ฐ™์ด ์›ํ†ต ํ˜•ํƒœ์˜ ๋ฌผ์ฒด๋Š” ๋งŽ์ด ์กด์žฌํ•œ๋‹ค. ๊ทธ๋“ค์˜ ํ‘œ๋ฉด์€ ์ผ๋ฐ˜ ์›ํ†ต ํ‘œ๋ฉด(GCS)์œผ๋กœ ๋ชจ๋ธ๋ง์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ํœ˜์–ด์ง„ ํ‘œ๋ฉด๋“ค์€ ๋งŽ์€ ๋ฌธ์ž์™€ ๊ทธ๋ฆผ๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ์ง€๋งŒ, ํฌํ•จ๋œ ๋ฌธ์ž๋Š” ๋ฌธ์„œ์— ๋น„ํ•ด์„œ ๋งค์šฐ ๋ถˆ๊ทœ์น™์ ์ธ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ์กด์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค๋กœ๋Š” ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์† ํœ˜์–ด์ง„ ํ‘œ๋ฉด ์˜์ƒ์„ ํ‰ํ™œํ™”ํ•˜๊ธฐ ํž˜๋“ค๋‹ค. ๋งŽ์€ ํœ˜์–ด์ง„ ํ‘œ๋ฉด์€ ์ž˜ ์ •๋ ฌ๋œ ์„ ๋ถ„๋“ค (ํ…Œ๋‘๋ฆฌ ์„  ํ˜น์€ ๋ฐ”์ฝ”๋“œ)์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ด€์ธก์— ๊ทผ๊ฑฐํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์•ž์„œ ์ œ์•ˆํ•œ ์„ ๋ถ„๋“ค์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํœ˜์–ด์ง„ ํ‘œ๋ฉด์„ ํ‰ํ™œํ™”ํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ๋‘ฅ๊ทผ ๋ฌผ์ฒด์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ์˜์ƒ๋“ค์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ‰ํ™œํ™”๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.The optical character recognition (OCR) of text images captured by cameras plays an important role for scene understanding. However, the OCR of camera-captured image is still considered a challenging problem, even after the text detection (localization). It is mainly due to the geometric distortions caused by page curve and perspective view, therefore their rectification has been an essential pre-processing step for their recognition. Thus, there have been many text image rectification methods which recover the fronto-parallel view image from a single distorted image. Recently, many researchers have focused on the properties of the well-rectified text. In this respect, this dissertation presents novel alignment properties for text image rectification, which are encoded into the proposed cost functions. By minimizing the cost functions, the transformation parameters for rectification are obtained. In detail, they are applied to three topics: document image dewarping, scene text rectification, and curved surface dewarping in real scene. First, a document image dewarping method is proposed based on the alignments of text-lines and line segments. Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, the proposed method uses line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, the proposed method encodes this properties into the cost function in addition to the text-line based cost. By minimizing the function, the proposed method can obtain transformation parameters for page curve, camera pose, and focal length, which are used for document image rectification. Considering that there are many outliers in line segment directions and miss-detected text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, the proposed method removes the text-lines and line segments that are not well aligned, and then minimizes the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. This dissertation also presents a method for scene text rectification. Conventional methods for scene text rectification mainly exploited the glyph property, which means that the characters in many language have horizontal/vertical strokes and also some symmetric shapes. However, since they consider the only shape properties of individual character, without considering the alignments of characters, they work well for only images with a single character, and still yield mis-aligned results for images with multiple characters. In order to alleviate this problem, the proposed method explicitly imposes alignment constraints on rectified results. To be precise, character alignments as well as glyph properties are encoded in the proposed cost function, and the transformation parameters are obtained by minimizing the function. Also, in order to encode the alignments of characters into the cost function, the proposed method separates the text into individual characters using a projection profile method before optimizing the cost function. Then, top and bottom lines are estimated using a least squares line fitting with RANSAC. Overall algorithm is designed to perform character segmentation, line fitting, and rectification iteratively. Since the cost function is non-convex and many variables are involved in the function, the proposed method also develops an optimization method using Augmented Lagrange Multiplier method. This dissertation evaluates the proposed method on real and synthetic text images and experimental results show that the proposed method achieves higher OCR accuracy than the conventional approach and also yields visually pleasing results. Finally, the proposed method can be extended to the curved surface dewarping in real scene. In real scene, there are many circular objects such as medicine bottles or cans of drinking water, and their curved surfaces can be modeled as Generalized Cylindrical Surfaces (GCS). These curved surfaces include many significant text and figures, however their text has irregular structure compared to documents. Therefore, the conventional dewarping methods based on the properties of well-rectified text have problems in their rectification. Based on the observation that many curved surfaces include well-aligned line segments (boundary lines of objects or barcode), the proposed method rectifies the curved surfaces by exploiting the proposed line segment terms. Experimental results on a range of images with curved surfaces of circular objects show that the proposed method performs rectification robustly.1 Introduction 1 1.1 Document image dewarping 3 1.2 Scene text rectification 5 1.3 Curved surface dewarping in real scene 7 1.4 Contents 8 2 Related work 9 2.1 Document image dewarping 9 2.1.1 Dewarping methods using additional information 9 2.1.2 Text-line based dewarping methods 10 2.2 Scene text rectification 11 2.3 Curved surface dewarping in real scene 12 3 Document image dewarping 15 3.1 Proposed cost function 15 3.1.1 Parametric model of dewarping process 15 3.1.2 Cost function design 18 3.1.3 Line segment properties and cost function 19 3.2 Outlier removal and optimization 26 3.2.1 Jacobian matrix of the proposed cost function 27 3.3 Document region detection and dewarping 31 3.4 Experimental results 32 3.4.1 Experimental results on text-abundant document images 33 3.4.2 Experimental results on non conventional document images 34 3.5 Summary 47 4 Scene text rectification 49 4.1 Proposed cost function for rectification 49 4.1.1 Cost function design 49 4.1.2 Character alignment properties and alignment terms 51 4.2 Overall algorithm 54 4.2.1 Initialization 55 4.2.2 Character segmentation 56 4.2.3 Estimation of the alignment parameters 57 4.2.4 Cost function optimization for rectification 58 4.3 Experimental results 63 4.4 Summary 66 5 Curved surface dewarping in real scene 73 5.1 Proposed curved surface dewarping method 73 5.1.1 Pre-processing 73 5.1 Experimental results 74 5.2 Summary 76 6 Conclusions 83 Bibliography 85 Abstract (Korean) 93Docto

    Learning geometric and lighting priors from natural images

    Get PDF
    Comprendre les images est dโ€™une importance cruciale pour une plรฉthore de tรขches, de la composition numรฉrique au rรฉ-รฉclairage dโ€™une image, en passant par la reconstruction 3D dโ€™objets. Ces tรขches permettent aux artistes visuels de rรฉaliser des chef-dโ€™oeuvres ou dโ€™aider des opรฉrateurs ร  prendre des dรฉcisions de faรงon sรฉcuritaire en fonction de stimulis visuels. Pour beaucoup de ces tรขches, les modรจles physiques et gรฉomรฉtriques que la communautรฉ scientifique a dรฉveloppรฉs donnent lieu ร  des problรจmes mal posรฉs possรฉdant plusieurs solutions, dont gรฉnรฉralement une seule est raisonnable. Pour rรฉsoudre ces indรฉterminations, le raisonnement sur le contexte visuel et sรฉmantique dโ€™une scรจne est habituellement relayรฉ ร  un artiste ou un expert qui emploie son expรฉrience pour rรฉaliser son travail. Ceci est dรป au fait quโ€™il est gรฉnรฉralement nรฉcessaire de raisonner sur la scรจne de faรงon globale afin dโ€™obtenir des rรฉsultats plausibles et apprรฉciables. Serait-il possible de modรฉliser lโ€™expรฉrience ร  partir de donnรฉes visuelles et dโ€™automatiser en partie ou en totalitรฉ ces tรขches ? Le sujet de cette thรจse est celui-ci : la modรฉlisation dโ€™a priori par apprentissage automatique profond pour permettre la rรฉsolution de problรจmes typiquement mal posรฉs. Plus spรฉcifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photomรฉtrie, 2) lโ€™estimation dโ€™illumination extรฉrieure ร  partir dโ€™une seule image et 3) lโ€™estimation de calibration de camรฉra ร  partir dโ€™une seule image avec un contenu gรฉnรฉrique. Ces trois sujets seront abordรฉs avec une perspective axรฉe sur les donnรฉes. Chacun de ces axes comporte des analyses de performance approfondies et, malgrรฉ la rรฉputation dโ€™opacitรฉ des algorithmes dโ€™apprentissage machine profonds, nous proposons des รฉtudes sur les indices visuels captรฉs par nos mรฉthodes.Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods

    Automatic Upright Adjustment of Photographs With Robust Camera Calibration

    No full text

    Automatic Upright Adjustment of Photographs with Robust Camera Calibration

    No full text
    Man-made structures often appear to be distorted in photos captured by casual photographers, as the scene layout often conflicts with how it is expected by human perception. In this paper, we propose an automatic approach for straightening up slanted man-made structures in an input image to improve its perceptual quality. We call this type of correction upright adjustment. We propose a set of criteria for upright adjustment based on human perception studies, and develop an optimization framework which yields an optimal homography for adjustment. We also develop a new optimization-based camera calibration method that performs favorably to previous methods and allows the proposed system to work reliably for a wide range of images. The effectiveness of our system is demonstrated by both quantitative comparisons and qualitative user study.X1156sciescopu
    corecore