303 research outputs found

    Geometric correction of historical Arabic documents

    Get PDF
    Geometric deformations in historical documents significantly influence the success of both Optical Character Recognition (OCR) techniques and human readability. They may have been introduced at any time during the life cycle of a document, from when it was first printed to the time it was digitised by an imaging device. This Thesis focuses on the challenging domain of geometric correction of Arabic historical documents, where background research has highlighted that existing approaches for geometric correction of Latin-script historical documents are not sensitive to the characteristics of text in Arabic documents and therefore cannot be applied successfully. Text line segmentation and baseline detection algorithms have been investigated to propose a new more suitable one for warped Arabic historical document images. Advanced ideas for performing dewarping and geometric restoration on historical Arabic documents, as dictated by the specific characteristics of the problem have been implemented.In addition to developing an algorithm to detect accurate baselines of historical printed Arabic documents the research also contributes a new dataset consisting of historical Arabic documents with different degrees of warping severity.Overall, a new dewarping system, the first for Historical Arabic documents, has been developed taking into account both global and local features of the text image and the patterns of the smooth distortion between text lines. By using the results of the proposed line segmentation and baseline detection methods, it can cope with a variety of distortions, such as page curl, arbitrary warping and fold

    Deep Unrestricted Document Image Rectification

    Full text link
    In recent years, tremendous efforts have been made on document image rectification, but existing advanced algorithms are limited to processing restricted document images, i.e., the input images must incorporate a complete document. Once the captured image merely involves a local text region, its rectification quality is degraded and unsatisfactory. Our previously proposed DocTr, a transformer-assisted network for document image rectification, also suffers from this limitation. In this work, we present DocTr++, a novel unified framework for document image rectification, without any restrictions on the input distorted images. Our major technical improvements can be concluded in three aspects. Firstly, we upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing. Secondly, we reformulate the pixel-wise mapping relationship between the unrestricted distorted document images and the distortion-free counterparts. The obtained data is used to train our DocTr++ for unrestricted document image rectification. Thirdly, we contribute a real-world test set and metrics applicable for evaluating the rectification quality. To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images. Extensive experiments are conducted, and the results demonstrate the effectiveness and superiority of our method. We hope our DocTr++ will serve as a strong baseline for generic document image rectification, prompting the further advancement and application of learning-based algorithms. The source code and the proposed dataset are publicly available at https://github.com/fh2019ustc/DocTr-Plus

    ์ •๋ ฌ ํŠน์„ฑ๋“ค ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ๋ฐ ์žฅ๋ฉด ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ์กฐ๋‚จ์ต.์นด๋ฉ”๋ผ๋กœ ์ดฌ์˜ํ•œ ํ…์ŠคํŠธ ์˜์ƒ์— ๋Œ€ํ•ด์„œ, ๊ด‘ํ•™ ๋ฌธ์ž ์ธ์‹(OCR)์€ ์ดฌ์˜๋œ ์žฅ๋ฉด์„ ๋ถ„์„ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ์˜ฌ๋ฐ”๋ฅธ ํ…์ŠคํŠธ ์˜์—ญ ๊ฒ€์ถœ ํ›„์—๋„, ์ดฌ์˜ํ•œ ์˜์ƒ์— ๋Œ€ํ•œ ๋ฌธ์ž ์ธ์‹์€ ์—ฌ์ „ํžˆ ์–ด๋ ค์šด ๋ฌธ์ œ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค. ์ด๋Š” ์ข…์ด์˜ ๊ตฌ๋ถ€๋Ÿฌ์ง๊ณผ ์นด๋ฉ”๋ผ ์‹œ์ ์— ์˜ํ•œ ๊ธฐํ•˜ํ•™์ ์ธ ์™œ๊ณก ๋•Œ๋ฌธ์ด๊ณ , ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ํ…์ŠคํŠธ ์˜์ƒ์— ๋Œ€ํ•œ ํ‰ํ™œํ™”๋Š” ๋ฌธ์ž ์ธ์‹์— ์žˆ์–ด์„œ ํ•„์ˆ˜์ ์ธ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์œผ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค. ์ด๋ฅผ ์œ„ํ•œ ์™œ๊ณก๋œ ์ดฌ์˜ ์˜์ƒ์„ ์ •๋ฉด ์‹œ์ ์œผ๋กœ ๋ณต์›ํ•˜๋Š” ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์€ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š”, ํ‰ํ™œํ™”๊ฐ€ ์ž˜ ๋œ ํ…์ŠคํŠธ์˜ ์„ฑ์งˆ์— ์ดˆ์ ์„ ๋งž์ถ˜ ์—ฐ๊ตฌ๋“ค์ด ์ฃผ๋กœ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ด€์ ์—์„œ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™”๋ฅผ ์œ„ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ •๋ ฌ ํŠน์„ฑ๋“ค์„ ๋‹ค๋ฃฌ๋‹ค. ์ด๋Ÿฌํ•œ ์ •๋ ฌ ํŠน์„ฑ๋“ค์€ ๋น„์šฉ ํ•จ์ˆ˜๋กœ ์„ค๊ณ„๋˜์–ด์ง€๊ณ , ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ํ‰ํ™œํ™”์— ์‚ฌ์šฉ๋˜์–ด์ง€๋Š” ํ‰ํ™œํ™” ๋ณ€์ˆ˜๋“ค์ด ๊ตฌํ•ด์ง„๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™”, ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™”, ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์†์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ํ‰ํ™œํ™”์™€ ๊ฐ™์ด 3๊ฐ€์ง€ ์„ธ๋ถ€ ์ฃผ์ œ๋กœ ๋‚˜๋ˆ ์ง„๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ๋ผ์ธ๋“ค๊ณผ ์„ ๋ถ„๋“ค์˜ ์ •๋ ฌ ํŠน์„ฑ์— ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด์˜ ํ…์ŠคํŠธ ๋ผ์ธ ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์˜ ๊ฒฝ์šฐ, ๋ฌธ์„œ๊ฐ€ ๋ณต์žกํ•œ ๋ ˆ์ด์•„์›ƒ ํ˜•ํƒœ์ด๊ฑฐ๋‚˜ ์ ์€ ์ˆ˜์˜ ํ…์ŠคํŠธ ๋ผ์ธ์„ ํฌํ•จํ•˜๊ณ  ์žˆ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋Š” ๋ฌธ์„œ์— ํ…์ŠคํŠธ ๋Œ€์‹  ๊ทธ๋ฆผ, ๊ทธ๋ž˜ํ”„ ํ˜น์€ ํ‘œ์™€ ๊ฐ™์€ ์˜์—ญ์ด ๋งŽ์€ ๊ฒฝ์šฐ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋ ˆ์ด์•„์›ƒ์— ๊ฐ•์ธํ•œ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™”๋ฅผ ์œ„ํ•˜์—ฌ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ •๋ ฌ๋œ ํ…์ŠคํŠธ ๋ผ์ธ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์„ ๋ถ„๋“ค๋„ ์ด์šฉํ•œ๋‹ค. ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ‰ํ™œํ™” ๋œ ์„ ๋ถ„๋“ค์€ ์—ฌ์ „ํžˆ ์ผ์ง์„ ์˜ ํ˜•ํƒœ์ด๊ณ , ๋Œ€๋ถ€๋ถ„ ๊ฐ€๋กœ ํ˜น์€ ์„ธ๋กœ ๋ฐฉํ–ฅ์œผ๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ๋‹ค๋Š” ๊ฐ€์ • ๋ฐ ๊ด€์ธก์— ๊ทผ๊ฑฐํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ด๋Ÿฌํ•œ ์„ฑ์งˆ๋“ค์„ ์ˆ˜์‹ํ™”ํ•˜๊ณ  ์ด๋ฅผ ํ…์ŠคํŠธ ๋ผ์ธ ๊ธฐ๋ฐ˜์˜ ๋น„์šฉ ํ•จ์ˆ˜์™€ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ข…์ด์˜ ๊ตฌ๋ถ€๋Ÿฌ์ง, ์นด๋ฉ”๋ผ ์‹œ์ , ์ดˆ์  ๊ฑฐ๋ฆฌ์™€ ๊ฐ™์€ ํ‰ํ™œํ™” ๋ณ€์ˆ˜๋“ค์„ ์ถ”์ •ํ•œ๋‹ค. ๋˜ํ•œ, ์˜ค๊ฒ€์ถœ๋œ ํ…์ŠคํŠธ ๋ผ์ธ๋“ค๊ณผ ์ž„์˜์˜ ๋ฐฉํ–ฅ์„ ๊ฐ€์ง€๋Š” ์„ ๋ถ„๋“ค๊ณผ ๊ฐ™์€ ์ด์ƒ์ (outlier)์„ ๊ณ ๋ คํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฐ˜๋ณต์ ์ธ ๋‹จ๊ณ„๋กœ ์„ค๊ณ„๋œ๋‹ค. ๊ฐ ๋‹จ๊ณ„์—์„œ, ์ •๋ ฌ ํŠน์„ฑ์„ ๋งŒ์กฑํ•˜์ง€ ์•Š๋Š” ์ด์ƒ์ ๋“ค์€ ์ œ๊ฑฐ๋˜๊ณ , ์ œ๊ฑฐ๋˜์ง€ ์•Š์€ ํ…์ŠคํŠธ ๋ผ์ธ ๋ฐ ์„ ๋ถ„๋“ค๋งŒ์ด ๋น„์šฉํ•จ์ˆ˜ ์ตœ์ ํ™”์— ์ด์šฉ๋œ๋‹ค. ์ˆ˜ํ–‰ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•œ ๋ ˆ์ด์•„์›ƒ์— ๋Œ€ํ•˜์—ฌ ๊ฐ•์ธํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ๋Š”, ๋ณธ ๋…ผ๋ฌธ์€ ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์˜ ๊ฒฝ์šฐ, ๊ฐ€๋กœ/์„ธ๋กœ ๋ฐฉํ–ฅ์˜ ํš, ๋Œ€์นญ ํ˜•ํƒœ์™€ ๊ฐ™์€ ๋ฌธ์ž๊ฐ€ ๊ฐ€์ง€๋Š” ๊ณ ์œ ์˜ ์ƒ๊น€์ƒˆ์— ๊ด€๋ จ๋œ ํŠน์„ฑ์„ ์ด์šฉํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ํ˜•ํƒœ๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ , ๊ฐ๊ฐ ๊ฐœ๋ณ„ ๋ฌธ์ž์— ๋Œ€ํ•œ ํŠน์„ฑ๋“ค๋งŒ์„ ์ด์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ ๋ฌธ์ž๋“ค๋กœ ๊ตฌ์„ฑ๋œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด์„œ ์ž˜ ์ •๋ ฌ๋˜์ง€ ์•Š์€ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ๋‹ค. ์ •ํ™•ํ•˜๊ฒŒ๋Š”, ๋ฌธ์ž ๊ณ ์œ ์˜ ๋ชจ์–‘๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ •๋ ฌ ํŠน์„ฑ๋“ค๋„ ํ•จ๊ป˜ ๋น„์šฉํ•จ์ˆ˜๋กœ ์ˆ˜์‹ํ™”๋˜๊ณ , ๋น„์šฉํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ํ‰ํ™œํ™”๊ฐ€ ์ง„ํ–‰๋œ๋‹ค. ๋˜ํ•œ, ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ํŠน์„ฑ์„ ์ˆ˜์‹ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํ…์ŠคํŠธ๋ฅผ ๊ฐ๊ฐ ๊ฐœ๋ณ„ ๋ฌธ์ž๋“ค๋กœ ๋ถ„๋ฆฌํ•˜๋Š” ๋ฌธ์ž ๋ถ„๋ฆฌ ๋˜ํ•œ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๊ทธ ๋’ค, ํ…์ŠคํŠธ์˜ ์œ„, ์•„๋ž˜ ์„ ๋“ค์„ RANSAC ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•œ ์ตœ์†Œ ์ œ๊ณฑ๋ฒ•์„ ํ†ตํ•ด ์ถ”์ •ํ•œ๋‹ค. ์ฆ‰, ์ „์ฒด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž ๋ถ„๋ฆฌ์™€ ์„  ์ถ”์ •, ํ‰ํ™œํ™”๊ฐ€ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋น„์šฉํ•จ์ˆ˜๋Š” ๋ณผ๋ก(convex)ํ˜•ํƒœ๊ฐ€ ์•„๋‹ˆ๊ณ  ๋˜ํ•œ ๋งŽ์€ ๋ณ€์ˆ˜๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Augmented Lagrange Multiplier ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜ ์ดฌ์˜ ์˜์ƒ๊ณผ ํ•ฉ์„ฑ๋œ ํ…์ŠคํŠธ ์˜์ƒ์„ ํ†ตํ•ด ์‹คํ—˜์ด ์ง„ํ–‰๋˜์—ˆ๊ณ , ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์— ๋น„ํ•˜์—ฌ ๋†’์€ ์ธ์‹ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉด์„œ ๋™์‹œ์— ์‹œ๊ฐ์ ์œผ๋กœ๋„ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์†์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์œผ๋กœ๋„ ํ™•์žฅ๋œ๋‹ค. ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ์— ๋Œ€ํ•ด์„œ, ์•ฝ๋ณ‘์ด๋‚˜ ์Œ๋ฃŒ์ˆ˜ ์บ”๊ณผ ๊ฐ™์ด ์›ํ†ต ํ˜•ํƒœ์˜ ๋ฌผ์ฒด๋Š” ๋งŽ์ด ์กด์žฌํ•œ๋‹ค. ๊ทธ๋“ค์˜ ํ‘œ๋ฉด์€ ์ผ๋ฐ˜ ์›ํ†ต ํ‘œ๋ฉด(GCS)์œผ๋กœ ๋ชจ๋ธ๋ง์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ํœ˜์–ด์ง„ ํ‘œ๋ฉด๋“ค์€ ๋งŽ์€ ๋ฌธ์ž์™€ ๊ทธ๋ฆผ๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ์ง€๋งŒ, ํฌํ•จ๋œ ๋ฌธ์ž๋Š” ๋ฌธ์„œ์— ๋น„ํ•ด์„œ ๋งค์šฐ ๋ถˆ๊ทœ์น™์ ์ธ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ์กด์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค๋กœ๋Š” ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์† ํœ˜์–ด์ง„ ํ‘œ๋ฉด ์˜์ƒ์„ ํ‰ํ™œํ™”ํ•˜๊ธฐ ํž˜๋“ค๋‹ค. ๋งŽ์€ ํœ˜์–ด์ง„ ํ‘œ๋ฉด์€ ์ž˜ ์ •๋ ฌ๋œ ์„ ๋ถ„๋“ค (ํ…Œ๋‘๋ฆฌ ์„  ํ˜น์€ ๋ฐ”์ฝ”๋“œ)์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ด€์ธก์— ๊ทผ๊ฑฐํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์•ž์„œ ์ œ์•ˆํ•œ ์„ ๋ถ„๋“ค์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํœ˜์–ด์ง„ ํ‘œ๋ฉด์„ ํ‰ํ™œํ™”ํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ๋‘ฅ๊ทผ ๋ฌผ์ฒด์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ์˜์ƒ๋“ค์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ‰ํ™œํ™”๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.The optical character recognition (OCR) of text images captured by cameras plays an important role for scene understanding. However, the OCR of camera-captured image is still considered a challenging problem, even after the text detection (localization). It is mainly due to the geometric distortions caused by page curve and perspective view, therefore their rectification has been an essential pre-processing step for their recognition. Thus, there have been many text image rectification methods which recover the fronto-parallel view image from a single distorted image. Recently, many researchers have focused on the properties of the well-rectified text. In this respect, this dissertation presents novel alignment properties for text image rectification, which are encoded into the proposed cost functions. By minimizing the cost functions, the transformation parameters for rectification are obtained. In detail, they are applied to three topics: document image dewarping, scene text rectification, and curved surface dewarping in real scene. First, a document image dewarping method is proposed based on the alignments of text-lines and line segments. Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, the proposed method uses line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, the proposed method encodes this properties into the cost function in addition to the text-line based cost. By minimizing the function, the proposed method can obtain transformation parameters for page curve, camera pose, and focal length, which are used for document image rectification. Considering that there are many outliers in line segment directions and miss-detected text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, the proposed method removes the text-lines and line segments that are not well aligned, and then minimizes the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. This dissertation also presents a method for scene text rectification. Conventional methods for scene text rectification mainly exploited the glyph property, which means that the characters in many language have horizontal/vertical strokes and also some symmetric shapes. However, since they consider the only shape properties of individual character, without considering the alignments of characters, they work well for only images with a single character, and still yield mis-aligned results for images with multiple characters. In order to alleviate this problem, the proposed method explicitly imposes alignment constraints on rectified results. To be precise, character alignments as well as glyph properties are encoded in the proposed cost function, and the transformation parameters are obtained by minimizing the function. Also, in order to encode the alignments of characters into the cost function, the proposed method separates the text into individual characters using a projection profile method before optimizing the cost function. Then, top and bottom lines are estimated using a least squares line fitting with RANSAC. Overall algorithm is designed to perform character segmentation, line fitting, and rectification iteratively. Since the cost function is non-convex and many variables are involved in the function, the proposed method also develops an optimization method using Augmented Lagrange Multiplier method. This dissertation evaluates the proposed method on real and synthetic text images and experimental results show that the proposed method achieves higher OCR accuracy than the conventional approach and also yields visually pleasing results. Finally, the proposed method can be extended to the curved surface dewarping in real scene. In real scene, there are many circular objects such as medicine bottles or cans of drinking water, and their curved surfaces can be modeled as Generalized Cylindrical Surfaces (GCS). These curved surfaces include many significant text and figures, however their text has irregular structure compared to documents. Therefore, the conventional dewarping methods based on the properties of well-rectified text have problems in their rectification. Based on the observation that many curved surfaces include well-aligned line segments (boundary lines of objects or barcode), the proposed method rectifies the curved surfaces by exploiting the proposed line segment terms. Experimental results on a range of images with curved surfaces of circular objects show that the proposed method performs rectification robustly.1 Introduction 1 1.1 Document image dewarping 3 1.2 Scene text rectification 5 1.3 Curved surface dewarping in real scene 7 1.4 Contents 8 2 Related work 9 2.1 Document image dewarping 9 2.1.1 Dewarping methods using additional information 9 2.1.2 Text-line based dewarping methods 10 2.2 Scene text rectification 11 2.3 Curved surface dewarping in real scene 12 3 Document image dewarping 15 3.1 Proposed cost function 15 3.1.1 Parametric model of dewarping process 15 3.1.2 Cost function design 18 3.1.3 Line segment properties and cost function 19 3.2 Outlier removal and optimization 26 3.2.1 Jacobian matrix of the proposed cost function 27 3.3 Document region detection and dewarping 31 3.4 Experimental results 32 3.4.1 Experimental results on text-abundant document images 33 3.4.2 Experimental results on non conventional document images 34 3.5 Summary 47 4 Scene text rectification 49 4.1 Proposed cost function for rectification 49 4.1.1 Cost function design 49 4.1.2 Character alignment properties and alignment terms 51 4.2 Overall algorithm 54 4.2.1 Initialization 55 4.2.2 Character segmentation 56 4.2.3 Estimation of the alignment parameters 57 4.2.4 Cost function optimization for rectification 58 4.3 Experimental results 63 4.4 Summary 66 5 Curved surface dewarping in real scene 73 5.1 Proposed curved surface dewarping method 73 5.1.1 Pre-processing 73 5.1 Experimental results 74 5.2 Summary 76 6 Conclusions 83 Bibliography 85 Abstract (Korean) 93Docto

    The rectification and recognition of document images with perspective and geometric distortions

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A self-supervised, physics-aware, Bayesian neural network architecture for modelling galaxy emission-line kinematics

    Get PDF
    In the upcoming decades large facilities, such as the SKA, will provide resolved observations of the kinematics of millions of galaxies. In order to assist in the timely exploitation of these vast datasets we blackexplore the use of a self-supervised, physics aware neural network capable of Bayesian kinematic modelling of galaxies. We demonstrate the networkโ€™s ability to model the kinematics of cold gas in galaxies with an emphasis on recovering physical parameters and accompanying modelling errors. The model is able to recover rotation curves, inclinations and disc scale lengths for both CO and Hโ€‰I data which match well with those found in the literature. The model is also able to provide modelling errors over learned parameters thanks to the application of quasi-Bayesian Monte-Carlo dropout. This work shows the promising use of machine learning, and in particular self-supervised neural networks, in the context of kinematically modelling galaxies. This work represents the first steps in applying such models for kinematic fitting and we propose that variants of our model would seem especially suitable for enabling emission-line science from upcoming surveys with e.g. the SKA, allowing fast exploitation of these large datasets

    ํ…์ŠคํŠธ์™€ ํŠน์ง•์  ๊ธฐ๋ฐ˜์˜ ๋ชฉ์ ํ•จ์ˆ˜ ์ตœ์ ํ™”๋ฅผ ์ด์šฉํ•œ ๋ฌธ์„œ์™€ ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2014. 8. ์กฐ๋‚จ์ต.There are many techniques and applications that detect and recognize text information in the images, e.g., document retrieval using the camera-captured document image, book reader for visually impaired, and augmented reality based on text recognition. In these applications, the planar surfaces which contain the text are often distorted in the captured image due to the perspective view (e.g., road signs), curvature (e.g., unfolded books), and wrinkles (e.g., old documents). Specifically, recovering the original document texture by removing these distortions from the camera-captured document images is called the document rectification. In this dissertation, new text surface rectification algorithms are proposed, for improving text recognition accuracy and visual quality. The proposed methods are categorized into 3 types depending on the types of the input. The contributions of the proposed methods can be summarized as follows. In the first rectification algorithm, the dense text-lines in the documents are employed to rectify the images. Unlike the conventional approaches, the proposed method does not directly use the text-line. Instead, the proposed method use the discrete representation of text-lines and text-blocks which are the sets of connected components. Also, the geometric distortion caused by page curl and perspective view are modeled as generalized cylindrical surfaces and camera rotation respectively. With these distortion model and discrete representation of the features, a cost function whose minimization yields parameters of the distortion model is developed. In the cost function, the properties of the pages such as text-block alignment, line-spacing, and the straightness of text-lines are encoded. By describing the text features using the sets of discrete points, the cost function can be easily defined and well solved by Levenberg-Marquadt algorithm. Experiments show that the proposed method works well for the various layouts and curved surfaces, and compares favorably with the conventional methods on the standard dataset. The second algorithm is a unified framework to rectify and stitch multiple document images using visual feature points instead of text lines. This is similar to the method employed in general image stitching algorithm. However, the general image stitching algorithm usually assumes fixed center of camera, which is not taken for granted in capturing the document. To deal with the camera motion between images, a new parametric family of motion model is proposed in this dissertation. Besides, to remove the ambiguity in the reference plane, a new cost function is developed to impose the constraints on the reference plane. This enables the estimation of physically correct reference plane without prior knowledge. The estimated reference plane can also be used to rectify the stitching result. Furthermore, the proposed method can be applied to any other planar object such as building facades or mural paintings as well as the camera-captured document image since it employs the general features. The third rectification method is based on scene text detection algorithm, which is independent from the language model. The conventional methods assume that a character consists of a single connected component (CC) like English alphabet. However, this assumption is brittle in the Asian characters such as Korean, Chinese, and Japanese, where a single character consists of several CCs. Therefore, it is difficult to divide CCs into text lines without language model. To alleviate this problem, the proposed method clusters the candidate regions based on the similarity measure considering inter-character relation. The adjacency measure is trained on the data set labeled with the bounding box of text region. Non-text regions that remain after clustering are filtered out in text/non-text classification step. Final text regions are merged or divided into each text line considering the orientation and location. The detected text is rectified using the orientation of text-line and vertical strokes. The proposed method outperforms state-of-the-art algorithms in English as well as Asian characters in the extensive experiments.1 Introduction 1 1.1 Document rectification via text-line based optimization . . . . . . . 2 1.2 A unified approach of rectification and stitching for document images 4 1.3 Rectification via scene text detection . . . . . . . . . . . . . . . . . . 5 1.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Related work 9 2.1 Document rectification . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Document dewarping without text-lines . . . . . . . . . . . . 9 2.1.2 Document dewarping with text-lines . . . . . . . . . . . . . . 10 2.1.3 Text-block identification and text-line extraction . . . . . . . 11 2.2 Document stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Scene text detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Document rectification based on text-lines 15 3.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Image acquisition model . . . . . . . . . . . . . . . . . . . . . 16 3.1.2 Proposed approach to document dewarping . . . . . . . . . . 18 3.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 22 3.2.1 Design of Estr(ยท) . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Minimization of Estr(ยท) . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Alignment type classification . . . . . . . . . . . . . . . . . . 28 3.2.4 Design of Ealign(ยท) . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.5 Design of Espacing(ยท) . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Extension to unfolded book surfaces . . . . . . . . . . . . . . . . . . 32 3.4 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.1 Experiments on synthetic data . . . . . . . . . . . . . . . . . 36 3.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 39 3.4.3 Comparison with existing methods . . . . . . . . . . . . . . . 43 3.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Document rectification based on feature detection 49 4.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 51 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Homography between the i-th image and E . . . . . . . . . 52 4.2.3 Proposed cost function . . . . . . . . . . . . . . . . . . . . . . 53 4.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.5 Relation to the model in [17] . . . . . . . . . . . . . . . . . . 55 4.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 Classification of two cases . . . . . . . . . . . . . . . . . . . . 56 4.3.2 Skew removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.1 Quantitative evaluation on metric reconstruction performance 57 4.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 58 5 Scene text detection and rectification 67 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.2 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Candidate region detection . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.1 CC extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.2 Computation of similarity between CCs . . . . . . . . . . . . 70 5.2.3 CC clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3 Rectification of candidate region . . . . . . . . . . . . . . . . . . . . 73 5.4 Text/non-text classification . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.1 Experimental results on ICDAR 2011 dataset . . . . . . . . . 80 5.5.2 Experimental results on the Asian character dataset . . . . . 80 6 Conclusion 83 Bibliography 87 Abstract (Korean) 97Docto

    Stability and Seakeeping of Marine Vessels

    Get PDF
    This book presents the papers accepted into the Special Issue โ€œStability and Seakeeping of Marine Vesselsโ€ and includes nine contributions to this Special Issue published in 2020. The overall aim of the collection is to improve knowledge about the most relevant and recent topics in ship stability and seakeeping. Specifically, the articles cover a wide range of topics and reflect the recent scientific efforts in the 2nd generation intact stability criteria evaluation and modelling of the ship dynamics assessment in intact or damaged conditions. These topics were investigated mainly through direct assessments performed both via numerical methods and tools, and experimental approaches. The book is addressed to individuals from universities, research organizations, industry, government agencies and certifying authorities, as well as designers, operators and owners who contribute to improved knowledge about โ€œstability and seakeepingโ€

    Exploring the use of Machine Learning with extragalactic emission-line surveys, in preparation for the Square Kilometre Array

    Get PDF
    This thesis investigates the use of machine learning for analysing the kinematics of galaxies in a time efficient manner. The application of machine learning in astronomy is arguably nascent, and very much so in the case of galaxy kinematics. Being able to extract kinematic information at speed will be important come the advent of next generation telescopes such as the Square Kilometre Array. Such instruments will collect raw data on scales too large to store. Therefore, the use of on the fly modelling techniques, harnessing the power of machine learning, is crucial. I will show that it is possible and beneficial to use machine learning algorithms to tackle scientific questions in extragalactic astronomy in this way. This thesis starts by investigating the use of machine learning algorithms for rapidly discriminating between disturbed and orderly rotating gas discs in galaxies. Specifically, cold dense molecular gas discs are embedded onto a latent manifold using convolutional autoencoders (CAE) which boast powerful automated feature embedding capabilities. Using hydrodynamical simulations to create mock observational data, the CAE is trained on millions of naturally augmented moment one maps before testing on observational HI data from the Local Volume HI Survey (Koribalski et al. 2018), as well CO observational data from various surveys using ALMA. Using a simple binary classifier on the embeddings, it can be shown that disturbed and orderly rotating discs are separately classified with high accuracy even in the presence of injected noise. Such models may be useful as fast filtering tools for identifying mergers or relaxed discs for further kinematic modelling. Bearing in mind that transfer learning for next generation survey datasets holds great risk, a new approach to kinematically characterising gas in galaxies is studied next. Using self-supervised physics-aware neural networks, the need for a throw-away training set is removed entirely, and replaced with a model which can learn physical parameterisations of galaxy rotation curves at rapid speed. With the introduction of monte carlo dropout, it is also possible to recover modelling errors for kinematic parameters, which will be useful in gauging the validity of learned parameters. These models are tested on simulated data as well as observational CO data from the WISDOM survey and HI data from THINGS (Walter et al. 2008). Learned rotation curves match well with those derived from more analytically motivated modelling tools (e.g. Bbarolo, Di Teodoro & Fraternali 2015), but compute parameterisations in a fraction of the time. Finally I study the use of the aforementioned self-supervised physics-aware neural networks, to recover the H-alpha Tully-Fisher relation (TFR) from largest IFU dataset to date. To do so, moment maps from both SAMI and MaNGA IFU surveys are used to derive the rotational velocities of low redshift galaxies. These are then fit against mass to derive both the forward and reverse TFR. The fits are in agreement with those found in the wider literature except that my fits have shallower gradients because a correction for asymmetric drift is applied in this work, but not in the comparison fits from the literature. Here, I identify and quantify trends between position along (and perpendicular to) the TFR and galaxy properties, namely: age and mass-to-light ratio. A clear relation is also discussed between velocity turnover radius, r-turn/r-e, and stellar mass. The application of models originally designed for use with millimetre and radio interferometric data, shows the benefits of using self-supervised physics-aware approaches to circumvent the problems often associated with transfer learning. Such methods will be useful when applied to next generation IFU survey data releases, with instruments such as HECTOR. In summary, in this thesis, I explore the different machine learning approaches to kinematically characterise galaxies in a time-efficient manner. I conclude with some remaining questions and avenues for future research
    • โ€ฆ
    corecore