114 research outputs found

    HoughNet: neural network architecture for vanishing points detection

    Full text link
    In this paper we introduce a novel neural network architecture based on Fast Hough Transform layer. The layer of this type allows our neural network to accumulate features from linear areas across the entire image instead of local areas. We demonstrate its potential by solving the problem of vanishing points detection in the images of documents. Such problem occurs when dealing with camera shots of the documents in uncontrolled conditions. In this case, the document image can suffer several specific distortions including projective transform. To train our model, we use MIDV-500 dataset and provide testing results. The strong generalization ability of the suggested method is proven with its applying to a completely different ICDAR 2011 dewarping contest. In previously published papers considering these dataset authors measured the quality of vanishing point detection by counting correctly recognized words with open OCR engine Tesseract. To compare with them, we reproduce this experiment and show that our method outperforms the state-of-the-art result.Comment: 6 pages, 6 figures, 2 tables, 28 references, conferenc

    Generic Document Image Dewarping by Probabilistic Discretization of Vanishing Points

    Get PDF
    International audienceDocument images dewarping is still a challenge especially when documents are captured with one camera in an uncontrolled environment. In this paper we propose a generic approach based on vanishing points (VP) to reconstruct the 3D shape of document pages. Unlike previous methods we do not need to segment the text included in the documents. Therefore, our approach is less sensitive to pre-processing and segmentation errors. The computation of the VPs is robust and relies on the a-contrario framework, which has only one parameter whose setting is based on probabilistic reasoning instead of experimental tuning. Thus, our method can be applied to any kind of document including text and non-text blocks and extended to other kind of images. Experimental results show that the proposed method is robust to a variety of distortions

    ์ •๋ ฌ ํŠน์„ฑ๋“ค ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ๋ฐ ์žฅ๋ฉด ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ์กฐ๋‚จ์ต.์นด๋ฉ”๋ผ๋กœ ์ดฌ์˜ํ•œ ํ…์ŠคํŠธ ์˜์ƒ์— ๋Œ€ํ•ด์„œ, ๊ด‘ํ•™ ๋ฌธ์ž ์ธ์‹(OCR)์€ ์ดฌ์˜๋œ ์žฅ๋ฉด์„ ๋ถ„์„ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ์˜ฌ๋ฐ”๋ฅธ ํ…์ŠคํŠธ ์˜์—ญ ๊ฒ€์ถœ ํ›„์—๋„, ์ดฌ์˜ํ•œ ์˜์ƒ์— ๋Œ€ํ•œ ๋ฌธ์ž ์ธ์‹์€ ์—ฌ์ „ํžˆ ์–ด๋ ค์šด ๋ฌธ์ œ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค. ์ด๋Š” ์ข…์ด์˜ ๊ตฌ๋ถ€๋Ÿฌ์ง๊ณผ ์นด๋ฉ”๋ผ ์‹œ์ ์— ์˜ํ•œ ๊ธฐํ•˜ํ•™์ ์ธ ์™œ๊ณก ๋•Œ๋ฌธ์ด๊ณ , ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ํ…์ŠคํŠธ ์˜์ƒ์— ๋Œ€ํ•œ ํ‰ํ™œํ™”๋Š” ๋ฌธ์ž ์ธ์‹์— ์žˆ์–ด์„œ ํ•„์ˆ˜์ ์ธ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์œผ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค. ์ด๋ฅผ ์œ„ํ•œ ์™œ๊ณก๋œ ์ดฌ์˜ ์˜์ƒ์„ ์ •๋ฉด ์‹œ์ ์œผ๋กœ ๋ณต์›ํ•˜๋Š” ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์€ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š”, ํ‰ํ™œํ™”๊ฐ€ ์ž˜ ๋œ ํ…์ŠคํŠธ์˜ ์„ฑ์งˆ์— ์ดˆ์ ์„ ๋งž์ถ˜ ์—ฐ๊ตฌ๋“ค์ด ์ฃผ๋กœ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ด€์ ์—์„œ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ์˜์ƒ ํ‰ํ™œํ™”๋ฅผ ์œ„ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ •๋ ฌ ํŠน์„ฑ๋“ค์„ ๋‹ค๋ฃฌ๋‹ค. ์ด๋Ÿฌํ•œ ์ •๋ ฌ ํŠน์„ฑ๋“ค์€ ๋น„์šฉ ํ•จ์ˆ˜๋กœ ์„ค๊ณ„๋˜์–ด์ง€๊ณ , ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ํ‰ํ™œํ™”์— ์‚ฌ์šฉ๋˜์–ด์ง€๋Š” ํ‰ํ™œํ™” ๋ณ€์ˆ˜๋“ค์ด ๊ตฌํ•ด์ง„๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™”, ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™”, ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์†์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ํ‰ํ™œํ™”์™€ ๊ฐ™์ด 3๊ฐ€์ง€ ์„ธ๋ถ€ ์ฃผ์ œ๋กœ ๋‚˜๋ˆ ์ง„๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ๋ผ์ธ๋“ค๊ณผ ์„ ๋ถ„๋“ค์˜ ์ •๋ ฌ ํŠน์„ฑ์— ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด์˜ ํ…์ŠคํŠธ ๋ผ์ธ ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์˜ ๊ฒฝ์šฐ, ๋ฌธ์„œ๊ฐ€ ๋ณต์žกํ•œ ๋ ˆ์ด์•„์›ƒ ํ˜•ํƒœ์ด๊ฑฐ๋‚˜ ์ ์€ ์ˆ˜์˜ ํ…์ŠคํŠธ ๋ผ์ธ์„ ํฌํ•จํ•˜๊ณ  ์žˆ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋Š” ๋ฌธ์„œ์— ํ…์ŠคํŠธ ๋Œ€์‹  ๊ทธ๋ฆผ, ๊ทธ๋ž˜ํ”„ ํ˜น์€ ํ‘œ์™€ ๊ฐ™์€ ์˜์—ญ์ด ๋งŽ์€ ๊ฒฝ์šฐ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋ ˆ์ด์•„์›ƒ์— ๊ฐ•์ธํ•œ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™”๋ฅผ ์œ„ํ•˜์—ฌ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ •๋ ฌ๋œ ํ…์ŠคํŠธ ๋ผ์ธ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์„ ๋ถ„๋“ค๋„ ์ด์šฉํ•œ๋‹ค. ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ‰ํ™œํ™” ๋œ ์„ ๋ถ„๋“ค์€ ์—ฌ์ „ํžˆ ์ผ์ง์„ ์˜ ํ˜•ํƒœ์ด๊ณ , ๋Œ€๋ถ€๋ถ„ ๊ฐ€๋กœ ํ˜น์€ ์„ธ๋กœ ๋ฐฉํ–ฅ์œผ๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ๋‹ค๋Š” ๊ฐ€์ • ๋ฐ ๊ด€์ธก์— ๊ทผ๊ฑฐํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ด๋Ÿฌํ•œ ์„ฑ์งˆ๋“ค์„ ์ˆ˜์‹ํ™”ํ•˜๊ณ  ์ด๋ฅผ ํ…์ŠคํŠธ ๋ผ์ธ ๊ธฐ๋ฐ˜์˜ ๋น„์šฉ ํ•จ์ˆ˜์™€ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ข…์ด์˜ ๊ตฌ๋ถ€๋Ÿฌ์ง, ์นด๋ฉ”๋ผ ์‹œ์ , ์ดˆ์  ๊ฑฐ๋ฆฌ์™€ ๊ฐ™์€ ํ‰ํ™œํ™” ๋ณ€์ˆ˜๋“ค์„ ์ถ”์ •ํ•œ๋‹ค. ๋˜ํ•œ, ์˜ค๊ฒ€์ถœ๋œ ํ…์ŠคํŠธ ๋ผ์ธ๋“ค๊ณผ ์ž„์˜์˜ ๋ฐฉํ–ฅ์„ ๊ฐ€์ง€๋Š” ์„ ๋ถ„๋“ค๊ณผ ๊ฐ™์€ ์ด์ƒ์ (outlier)์„ ๊ณ ๋ คํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฐ˜๋ณต์ ์ธ ๋‹จ๊ณ„๋กœ ์„ค๊ณ„๋œ๋‹ค. ๊ฐ ๋‹จ๊ณ„์—์„œ, ์ •๋ ฌ ํŠน์„ฑ์„ ๋งŒ์กฑํ•˜์ง€ ์•Š๋Š” ์ด์ƒ์ ๋“ค์€ ์ œ๊ฑฐ๋˜๊ณ , ์ œ๊ฑฐ๋˜์ง€ ์•Š์€ ํ…์ŠคํŠธ ๋ผ์ธ ๋ฐ ์„ ๋ถ„๋“ค๋งŒ์ด ๋น„์šฉํ•จ์ˆ˜ ์ตœ์ ํ™”์— ์ด์šฉ๋œ๋‹ค. ์ˆ˜ํ–‰ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•œ ๋ ˆ์ด์•„์›ƒ์— ๋Œ€ํ•˜์—ฌ ๊ฐ•์ธํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ๋Š”, ๋ณธ ๋…ผ๋ฌธ์€ ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ์žฅ๋ฉด ํ…์ŠคํŠธ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค์˜ ๊ฒฝ์šฐ, ๊ฐ€๋กœ/์„ธ๋กœ ๋ฐฉํ–ฅ์˜ ํš, ๋Œ€์นญ ํ˜•ํƒœ์™€ ๊ฐ™์€ ๋ฌธ์ž๊ฐ€ ๊ฐ€์ง€๋Š” ๊ณ ์œ ์˜ ์ƒ๊น€์ƒˆ์— ๊ด€๋ จ๋œ ํŠน์„ฑ์„ ์ด์šฉํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ํ˜•ํƒœ๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ , ๊ฐ๊ฐ ๊ฐœ๋ณ„ ๋ฌธ์ž์— ๋Œ€ํ•œ ํŠน์„ฑ๋“ค๋งŒ์„ ์ด์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ ๋ฌธ์ž๋“ค๋กœ ๊ตฌ์„ฑ๋œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด์„œ ์ž˜ ์ •๋ ฌ๋˜์ง€ ์•Š์€ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ๋‹ค. ์ •ํ™•ํ•˜๊ฒŒ๋Š”, ๋ฌธ์ž ๊ณ ์œ ์˜ ๋ชจ์–‘๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ •๋ ฌ ํŠน์„ฑ๋“ค๋„ ํ•จ๊ป˜ ๋น„์šฉํ•จ์ˆ˜๋กœ ์ˆ˜์‹ํ™”๋˜๊ณ , ๋น„์šฉํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ํ‰ํ™œํ™”๊ฐ€ ์ง„ํ–‰๋œ๋‹ค. ๋˜ํ•œ, ๋ฌธ์ž๋“ค์˜ ์ •๋ ฌ ํŠน์„ฑ์„ ์ˆ˜์‹ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํ…์ŠคํŠธ๋ฅผ ๊ฐ๊ฐ ๊ฐœ๋ณ„ ๋ฌธ์ž๋“ค๋กœ ๋ถ„๋ฆฌํ•˜๋Š” ๋ฌธ์ž ๋ถ„๋ฆฌ ๋˜ํ•œ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๊ทธ ๋’ค, ํ…์ŠคํŠธ์˜ ์œ„, ์•„๋ž˜ ์„ ๋“ค์„ RANSAC ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•œ ์ตœ์†Œ ์ œ๊ณฑ๋ฒ•์„ ํ†ตํ•ด ์ถ”์ •ํ•œ๋‹ค. ์ฆ‰, ์ „์ฒด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž ๋ถ„๋ฆฌ์™€ ์„  ์ถ”์ •, ํ‰ํ™œํ™”๊ฐ€ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋น„์šฉํ•จ์ˆ˜๋Š” ๋ณผ๋ก(convex)ํ˜•ํƒœ๊ฐ€ ์•„๋‹ˆ๊ณ  ๋˜ํ•œ ๋งŽ์€ ๋ณ€์ˆ˜๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Augmented Lagrange Multiplier ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜ ์ดฌ์˜ ์˜์ƒ๊ณผ ํ•ฉ์„ฑ๋œ ํ…์ŠคํŠธ ์˜์ƒ์„ ํ†ตํ•ด ์‹คํ—˜์ด ์ง„ํ–‰๋˜์—ˆ๊ณ , ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์— ๋น„ํ•˜์—ฌ ๋†’์€ ์ธ์‹ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉด์„œ ๋™์‹œ์— ์‹œ๊ฐ์ ์œผ๋กœ๋„ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์†์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•์œผ๋กœ๋„ ํ™•์žฅ๋œ๋‹ค. ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ์— ๋Œ€ํ•ด์„œ, ์•ฝ๋ณ‘์ด๋‚˜ ์Œ๋ฃŒ์ˆ˜ ์บ”๊ณผ ๊ฐ™์ด ์›ํ†ต ํ˜•ํƒœ์˜ ๋ฌผ์ฒด๋Š” ๋งŽ์ด ์กด์žฌํ•œ๋‹ค. ๊ทธ๋“ค์˜ ํ‘œ๋ฉด์€ ์ผ๋ฐ˜ ์›ํ†ต ํ‘œ๋ฉด(GCS)์œผ๋กœ ๋ชจ๋ธ๋ง์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ํœ˜์–ด์ง„ ํ‘œ๋ฉด๋“ค์€ ๋งŽ์€ ๋ฌธ์ž์™€ ๊ทธ๋ฆผ๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ์ง€๋งŒ, ํฌํ•จ๋œ ๋ฌธ์ž๋Š” ๋ฌธ์„œ์— ๋น„ํ•ด์„œ ๋งค์šฐ ๋ถˆ๊ทœ์น™์ ์ธ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ์กด์˜ ๋ฌธ์„œ ์˜์ƒ ํ‰ํ™œํ™” ๋ฐฉ๋ฒ•๋“ค๋กœ๋Š” ์ผ๋ฐ˜ ๋ฐฐ๊ฒฝ ์† ํœ˜์–ด์ง„ ํ‘œ๋ฉด ์˜์ƒ์„ ํ‰ํ™œํ™”ํ•˜๊ธฐ ํž˜๋“ค๋‹ค. ๋งŽ์€ ํœ˜์–ด์ง„ ํ‘œ๋ฉด์€ ์ž˜ ์ •๋ ฌ๋œ ์„ ๋ถ„๋“ค (ํ…Œ๋‘๋ฆฌ ์„  ํ˜น์€ ๋ฐ”์ฝ”๋“œ)์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ด€์ธก์— ๊ทผ๊ฑฐํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์•ž์„œ ์ œ์•ˆํ•œ ์„ ๋ถ„๋“ค์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํœ˜์–ด์ง„ ํ‘œ๋ฉด์„ ํ‰ํ™œํ™”ํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ๋‘ฅ๊ทผ ๋ฌผ์ฒด์˜ ํœ˜์–ด์ง„ ํ‘œ๋ฉด ์˜์ƒ๋“ค์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ‰ํ™œํ™”๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.The optical character recognition (OCR) of text images captured by cameras plays an important role for scene understanding. However, the OCR of camera-captured image is still considered a challenging problem, even after the text detection (localization). It is mainly due to the geometric distortions caused by page curve and perspective view, therefore their rectification has been an essential pre-processing step for their recognition. Thus, there have been many text image rectification methods which recover the fronto-parallel view image from a single distorted image. Recently, many researchers have focused on the properties of the well-rectified text. In this respect, this dissertation presents novel alignment properties for text image rectification, which are encoded into the proposed cost functions. By minimizing the cost functions, the transformation parameters for rectification are obtained. In detail, they are applied to three topics: document image dewarping, scene text rectification, and curved surface dewarping in real scene. First, a document image dewarping method is proposed based on the alignments of text-lines and line segments. Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, the proposed method uses line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, the proposed method encodes this properties into the cost function in addition to the text-line based cost. By minimizing the function, the proposed method can obtain transformation parameters for page curve, camera pose, and focal length, which are used for document image rectification. Considering that there are many outliers in line segment directions and miss-detected text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, the proposed method removes the text-lines and line segments that are not well aligned, and then minimizes the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. This dissertation also presents a method for scene text rectification. Conventional methods for scene text rectification mainly exploited the glyph property, which means that the characters in many language have horizontal/vertical strokes and also some symmetric shapes. However, since they consider the only shape properties of individual character, without considering the alignments of characters, they work well for only images with a single character, and still yield mis-aligned results for images with multiple characters. In order to alleviate this problem, the proposed method explicitly imposes alignment constraints on rectified results. To be precise, character alignments as well as glyph properties are encoded in the proposed cost function, and the transformation parameters are obtained by minimizing the function. Also, in order to encode the alignments of characters into the cost function, the proposed method separates the text into individual characters using a projection profile method before optimizing the cost function. Then, top and bottom lines are estimated using a least squares line fitting with RANSAC. Overall algorithm is designed to perform character segmentation, line fitting, and rectification iteratively. Since the cost function is non-convex and many variables are involved in the function, the proposed method also develops an optimization method using Augmented Lagrange Multiplier method. This dissertation evaluates the proposed method on real and synthetic text images and experimental results show that the proposed method achieves higher OCR accuracy than the conventional approach and also yields visually pleasing results. Finally, the proposed method can be extended to the curved surface dewarping in real scene. In real scene, there are many circular objects such as medicine bottles or cans of drinking water, and their curved surfaces can be modeled as Generalized Cylindrical Surfaces (GCS). These curved surfaces include many significant text and figures, however their text has irregular structure compared to documents. Therefore, the conventional dewarping methods based on the properties of well-rectified text have problems in their rectification. Based on the observation that many curved surfaces include well-aligned line segments (boundary lines of objects or barcode), the proposed method rectifies the curved surfaces by exploiting the proposed line segment terms. Experimental results on a range of images with curved surfaces of circular objects show that the proposed method performs rectification robustly.1 Introduction 1 1.1 Document image dewarping 3 1.2 Scene text rectification 5 1.3 Curved surface dewarping in real scene 7 1.4 Contents 8 2 Related work 9 2.1 Document image dewarping 9 2.1.1 Dewarping methods using additional information 9 2.1.2 Text-line based dewarping methods 10 2.2 Scene text rectification 11 2.3 Curved surface dewarping in real scene 12 3 Document image dewarping 15 3.1 Proposed cost function 15 3.1.1 Parametric model of dewarping process 15 3.1.2 Cost function design 18 3.1.3 Line segment properties and cost function 19 3.2 Outlier removal and optimization 26 3.2.1 Jacobian matrix of the proposed cost function 27 3.3 Document region detection and dewarping 31 3.4 Experimental results 32 3.4.1 Experimental results on text-abundant document images 33 3.4.2 Experimental results on non conventional document images 34 3.5 Summary 47 4 Scene text rectification 49 4.1 Proposed cost function for rectification 49 4.1.1 Cost function design 49 4.1.2 Character alignment properties and alignment terms 51 4.2 Overall algorithm 54 4.2.1 Initialization 55 4.2.2 Character segmentation 56 4.2.3 Estimation of the alignment parameters 57 4.2.4 Cost function optimization for rectification 58 4.3 Experimental results 63 4.4 Summary 66 5 Curved surface dewarping in real scene 73 5.1 Proposed curved surface dewarping method 73 5.1.1 Pre-processing 73 5.1 Experimental results 74 5.2 Summary 76 6 Conclusions 83 Bibliography 85 Abstract (Korean) 93Docto

    An investigation into common challenges of 3D scene understanding in visual surveillance

    Get PDF
    Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics โ€“ methods to extract optimal features for a specific CCTV application โ€“ as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object โ€œclassificationโ€ and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a sceneโ€™s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future

    Ajuste de Perspectiva Automรกtico Aplicado em Imagens de Gรดndolas de Supermercado

    Get PDF
    TCC(graduaรงรฃo) - Universidade Federal de Santa Catarina. Campus Araranguรก. Engenharia da Computaรงรฃo.Grande parte dos produtos presentes nas gรดndolas de estabelecimentos comerciais nรฃo sรฃo notadas pelos consumidores. Por esse motivo, a indรบstria necessita criar e monitorar as estratรฉgias utilizadas nos pontos de venda para aumentar o alcance das aรงรตes de marketing. Isso pode ser planejado utilizando fotos da gรดndola dos locais, obtidas por promotores de vendas, e processadas a posteriori por mรฉtodos de processamento digital de imagens e reconhecimento de padrรตes. Entretanto, caso nรฃo for possรญvel capturar uma imagem perpendicular ao plano frontal, pode haver perda de informaรงรตes dos produtos apresentados na mesma. Nesse contexto, o trabalho em questรฃo apresenta a modelagem de uma soluรงรฃo computacional totalmente automรกtica cujo objetivo รฉ realizar a correรงรฃo da perspectiva presente em imagens de gรดndolas de supermercados. Utilizando mรฉtodos computacionais jรก consolidados na literatura, baseados em pontos de fuga, foi possรญvel obter um alinhamento adequado em 94% das imagens ajustadas e, em um dataset de validaรงรฃo, comprovou-se que as fotos corrigidas ficaram mais semelhantes ร s suas fotos frontais em todos os casos testados.Most of the products present on the shelves of commercial establishments are not noticed by consumers. For this reason, the industry needs to create and monitor the strategies used at points of sale to increase the reach of marketing actions. This can be planned using photos of the shelf, obtained by sales promoters and processed a posteriori by digital image processing and pattern recognition methods. However, if it is not possible to capture an image perpendicular to the frontal plane, there may be loss of information about the products presented on the shelf. In this context, the current study presents the modeling of a fully automatic computational solution whose goal is to perform the correction of the perspective inserted in supermarket shelf images. Using computational methods already consolidated in the literature, based on vanishing points, it was possible to obtain a proper alignment in 94% of the adjusted images and, in a validation dataset, it was verified that the corrected photos were more similar to their frontal photos in all cases tested

    Reversing ShopView analysis for planogram creation

    Get PDF
    Com o aumento da preocupaรงรฃo dos retalhistas na melhora dos resultados das vendas e da experiรชncia dos consumidores, existe uma necessidade de desenvolver tecnologia que auxilie a otimizaรงรฃo desses objetivos. Estรก provado que uma correta colocaรงรฃo dos produtos nas prateleiras pode aumentar significativamente as vendas e a experiรชncia dos consumidores [1]. Com isto em mente, a Fraunhofer Portugal desenvolveu o ShopView [2], uma soluรงรฃo que tem como objetivo ajudar os retalhistas a extrair, validar e manipular os planogramas a partir de imagens de alta resoluรงรฃo das lojas. Desta forma, esta tese tem como foco a criaรงรฃo de um algoritmo, com auxilio a algoritmos de visรฃo computacional, de forma a extrair informaรงรฃo de images de alta resoluรงรฃo de gondolas de supermercados obtidas pelo ShopView. Particularmente, foram implementados passos de prรฉ-processamento de forma a melhorara a eficiรชncia e precisรฃo de um motor de OCR (Optical Character Recognition) no reconhecimento de texto nos produtos da gondola. Estes algoritmos de prรฉ-processamento sรฃo compostos por tรฉcnicas de segmentaรงรฃo e remoรงรฃo de ruido. O uso deste motor de OCR permite obter informaรงรตes adicionais sobre os produtos e separaรงรตes das prateleiras, esta informaรงรฃo รฉ posteriormente usada em algoritmos de agrupamento para extrair automรกticamente um planograma preciso a partir das imagens. O algoritmo apresentado รฉ capaz de extrair informaรงรฃo relevante das imagens das gondolas, de forma a identificar os produtos existentes e criar metadados vรกlidos sobre esses produtos e a sua localizaรงรฃo. Com estes metadados รฉ possรญvel criar, validar e modificar o planograma no ShopView. O uso de OCR neste algoritmo oferece vantagens sobre as outras abordagens disponรญveis devido ร  capacidade de diferenciar produtos com diferenรงas mรญnimas entre si, e a sua imunidade a alteraรงรตes no aspeto das embalagens dos produtos. Alรฉm disso, a metodologia proposta nรฃo requer nenhuma interaรงรฃo prรฉvia do utilizador para funcionar corretamente.[1] Chanjin Chung, Todd M. Schmit, Diansheng Dong, and Harry M. Kaiser. Economic evaluation of shelf-space management in grocery stores. Agribusiness, 23(4):583-597, September 2007.[2] L. Rosado, J. Gonรงalves, J. Costa, D. Ribeiro, and F. Soares. Supervised learning for out-of-stock detection in panoramas of retail shelves. In 2016 IEEE International Conference on Imaging Systems and Techniques (IST), pages 406-411.With the increasing care of retail shop owners in improving sales and costumer experience, there is a need to develop technology in order to optimize their goals. It's proven that a planned product placement can boost sales and improve costumer experience [1]. With this in mind, Fraunhofer Portugal came up with ShopView [2] solution to help retail shops extract, validate and manipulate planograms from high-resolution images of the real shelves in the store.In this sense, this thesis focused on the creation of an algorithm using computer vision algorithms to extract information from high resolution images of retail shelves taken with ShopView solution. Particularly, it was implemented pre-processing steps to improve the efficiency and accuracy of an OCR (Optical Character Recognition) engine in recognizing the text in the shelves products. These pre-processing algorithms comprise of denoising and segmentation techniques. The use of this OCR engine brings additional information about products and shelves separation, this information is later used in clustering algorithms to automatically extract an accurate planogram from shelves photos. The presented algorithm is capable of extracting relevant information from the shelves images, to identify the existing products and create a valid metadata about them and their location. With this metadata, it is possible to create, validate and modify the planogram in ShopView. The use of OCR on this algorithm has advantages over other available approaches due to its capability to differentiate products with minimal visual differences and its immunity to appearance changes on the products packaging. Moreover, the methodology proposed does not require any previous user interaction to work properly.[1] Chanjin Chung, Todd M. Schmit, Diansheng Dong, and Harry M. Kaiser. Economic evaluation of shelf-space management in grocery stores. Agribusiness, 23(4):583-597, September 2007.[2] L. Rosado, J. Gonรงalves, J. Costa, D. Ribeiro, and F. Soares. Supervised learning for out-of-stock detection in panoramas of retail shelves. In 2016 IEEE International Conference on Imaging Systems and Techniques (IST), pages 406-411

    Airborne vision-based attitude estimation and localisation

    Get PDF
    Vision plays an integral part in a pilot's ability to navigate and control an aircraft. Therefore Visual Flight Rules have been developed around the pilot's ability to see the environment outside of the cockpit in order to control the attitude of the aircraft, to navigate and to avoid obstacles. The automation of these processes using a vision system could greatly increase the reliability and autonomy of unmanned aircraft and flight automation systems. This thesis investigates the development and implementation of a robust vision system which fuses inertial information with visual information in a probabilistic framework with the aim of aircraft navigation. The horizon appearance is a strong visual indicator of the attitude of the aircraft. This leads to the first research area of this thesis, visual horizon attitude determination. An image processing method was developed to provide high performance horizon detection and extraction from camera imagery. A number of horizon models were developed to link the detected horizon to the attitude of the aircraft with varying degrees of accuracy. The second area investigated in this thesis was visual localisation of the aircraft. A terrain-aided horizon model was developed to estimate the position, altitude as well as attitude of the aircraft. This gives rough positions estimates with highly accurate attitude information. The visual localisation accuracy was improved by incorporating ground feature-based map-aided navigation. Road intersections were detected using a developed image processing algorithm and then they were matched to a database to provide positional information. The developed vision system show comparable performance to other non-vision-based systems while removing the dependence on external systems for navigation. The vision system and techniques developed in this thesis helps to increase the autonomy of unmanned aircraft and flight automation systems for manned flight

    Multi-scale data fusion for surface metrology

    Get PDF
    The major trends in manufacturing are miniaturization, convergence of the traditional research fields and creation of interdisciplinary research areas. These trends have resulted in the development of multi-scale models and multi-scale surfaces to optimize the performance. Multi-scale surfaces that exhibit specific properties at different scales for a specific purpose require multi-scale measurement and characterization. Researchers and instrument developers have developed instruments that are able to perform measurements at multiple scales but lack the much required multi- scale characterization capability. The primary focus of this research was to explore possible multi-scale data fusion strategies and options for surface metrology domain and to develop enabling software tools in order to obtain effective multi-scale surface characterization, maximizing fidelity while minimizing measurement cost and time. This research effort explored the fusion strategies for surface metrology domain and narrowed the focus on Discrete Wavelet Frame (DWF) based multi-scale decomposition. An optimized multi-scale data fusion strategy โ€˜FWR methodโ€™ was developed and was successfully demonstrated on both high aspect ratio surfaces and non-planar surfaces. It was demonstrated that the datum features can be effectively characterized at a lower resolution using one system (Vision CMM) and the actual features of interest could be characterized at a higher resolution using another system (Coherence Scanning Interferometer) with higher capability while minimizing the measurement time

    Geometric Inference with Microlens Arrays

    Get PDF
    This dissertation explores an alternative to traditional fiducial markers where geometric information is inferred from the observed position of 3D points seen in an image. We offer an alternative approach which enables geometric inference based on the relative orientation of markers in an image. We present markers fabricated from microlenses whose appearance changes depending on the marker\u27s orientation relative to the camera. First, we show how to manufacture and calibrate chromo-coding lenticular arrays to create a known relationship between the observed hue and orientation of the array. Second, we use 2 small chromo-coding lenticular arrays to estimate the pose of an object. Third, we use 3 large chromo-coding lenticular arrays to calibrate a camera with a single image. Finally, we create another type of fiducial marker from lenslet arrays that encode orientation with discrete black and white appearances. Collectively, these approaches oer new opportunities for pose estimation and camera calibration that are relevant for robotics, virtual reality, and augmented reality
    • โ€ฆ
    corecore