7 research outputs found

    Feature Match for Medical Images

    Get PDF
    This paper represents an algorithm for Feature Match, a summed up estimated approximate nearest neighbor field (ANNF) calculation system, between a source and target image. The proposed calculation can estimate ANNF maps between any image sets, not as a matter of course related. This generalization is accomplished through proper spatial-range changes. To register ANNF maps, worldwide shading adjustment is connected as a reach change on the source picture. Image patches from the pair of pictures are approximated utilizing low-dimensional elements, which are utilized alongside KD-tree to appraise the ANNF map. This ANNF guide is further enhanced in view of picture coherency and spatial changes. The proposed generalization, empowers to handle a more extensive scope of vision applications, which have not been handled utilizing the ANNF structure. Here one such application is outlined namely: optic plate discovery .This application manages restorative imaging, where optic circles are found in retinal pictures utilizing a sound optic circle picture as regular target picture. ANNF mappings is used in this application and is shown experimentally that the proposed approaches are faster and accurate, compared with the state-of the-art techniques

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    NNMap: A method to construct a good embedding for nearest neighbor classification

    Get PDF
    a b s t r a c t This paper aims to deal with the practical shortages of nearest neighbor classifier. We define a quantitative criterion of embedding quality assessment for nearest neighbor classification, and present a method called NNMap to construct a good embedding. Furthermore, an efficient distance is obtained in the embedded vector space, which could speed up nearest neighbor classification. The quantitative quality criterion is proposed as a local structure descriptor of sample data distribution. Embedding quality corresponds to the quality of the local structure. In the framework of NNMap, one-dimension embeddings act as weak classifiers with pseudo-losses defined on the amount of the local structure preserved by the embedding. Based on this property, the NNMap method reduces the problem of embedding construction to the classical boosting problem. An important property of NNMap is that the embedding optimization criterion is appropriate for both vector and non-vector data, and equally valid in both metric and non-metric spaces. The effectiveness of the new method is demonstrated by experiments conducted on the MNIST handwritten dataset, the CMU PIE face images dataset and the datasets from UCI machine learning repository

    ํšจ์œจ์  ์˜์ƒ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 2. ์ตœ์ง„์˜.์ปดํ“จํ„ฐ ๋น„์ „ ๋ฌธ์ œ๋Š” ์˜์ƒ ํš๋“ ์žฅ์น˜๋ฅผ ํ†ตํ•ด ํ”ฝ์…€ ๋‹จ์œ„๋กœ ์ˆ˜์น˜ํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋ง ํ•˜๋Š” ๊ฒƒ์œผ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘๋œ๋‹ค. ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๋ฐ์ดํ„ฐ์ธ ํ”ฝ์…€ ๊ฐ’๋“ค์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๊ณ , ์ด ํ”ฝ์…€ ๊ฐ’๋“ค์„ ์กฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋“ค์„ ๊ตฌ์„ฑํ•˜๊ณ  ์ƒ˜ํ”Œ๋ง ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ตœ๋Œ€ํ•œ ๋งŽ์€ ์ˆ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋ง ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜์ง€๋งŒ ์ด๋Ÿด ๊ฒฝ์šฐ ํ•„์š”๋กœ ํ•˜๋Š” ์—ฐ์‚ฐ๋Ÿ‰์ด ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ์—ฐ์‚ฐ๋Ÿ‰ ๋งŒ์„ ๊ณ ๋ คํ•ด ์ตœ์†Œํ•œ์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ƒ˜ํ”Œ๋ง ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋Œ€ํ•˜๊ธฐ ์–ด๋ ต๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ๋Ÿ‰์œผ๋กœ ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š”, ์ด๋ฏธ์ง€๊ฐ€ ๋ฐ”๋€œ์— ๋”ฐ๋ผ ํ˜น์€ ์‹œ๊ฐ„์ด ํ๋ฆ„์— ๋”ฐ๋ผ ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ตœ์†Œํ•œ์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ฐพ์•„๋‚ด์–ด ์ƒ˜ํ”Œ๋ง ํ•˜๋Š” ๋Šฅ๋™ ์ƒ˜ํ”Œ๋ง(active sampling) ๊ฐœ๋…์ด ํ•„์š”ํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ๋Šฅ๋™ ์ƒ˜ํ”Œ๋ง ๊ฐœ๋…์„ ํ˜„์‹คํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”๋ฐ ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ๋“ค์„ ์ฐพ์•„๋‚ด๋Š” ๊ณผ์ •์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋ฉฐ, ์ฐพ์•„๋‚ธ ๋ฐ์ดํ„ฐ๋“ค์„ ์–ด๋–ป๊ฒŒ ์ง‘์ค‘ํ•˜์—ฌ ์ƒ˜ํ”Œ๋ง ํ•˜๋Š”๊ฐ€๊ฐ€ ์ค‘์š”ํ•ด์ง„๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์„ธ ๊ฐ€์ง€์˜ ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง(attentional sampling) ๋ฐฉ๋ฒ•, ์ฆ‰ ๊ตฌ์กฐ์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง(structured attentional sampling), ๊ฒฝํ—˜์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง(empirical attentional sampling), ์„ ํƒ์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง(selective attentional sampling)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๊ฐ๊ฐ์˜ ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•๋“ค์€ ์ฃผ์˜์ง‘์ค‘์ด ํ•„์š”ํ•œ ์ค‘์š” ๋ฐ์ดํ„ฐ๋“ค์„ ์ฐพ๊ธฐ ์œ„ํ•ด ๋ฌธ์ œ์˜ ํŠน์„ฑ์— ๋Œ€ํ•œ ์‚ฌ์ „ ์ง€์‹(prior knowledge)์„ ์ ์šฉํ•˜๋Š” ์„ธ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๊ทธ์— ๋”ฐ๋ผ ์ ์‘์ ์œผ๋กœ ์ƒ˜ํ”Œ๋ง ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์ด๋‹ค. ์ œ์•ˆ๋œ ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•๋“ค์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ฌธ์ œ๋“ค์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋˜์–ด ์—ฐ์‚ฐ ํšจ์œจ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ ์‹œ์ผฐ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๊ตฌ์กฐ์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง(structured attentional sampling)์€ ๋ฌธ์ œ์˜ ํŠน์„ฑ์— ๋งž์ถฐ ๋ฏธ๋ฆฌ ๊ตฌ์กฐํ™”๋œ ์ƒ˜ํ”Œ๋ง ํŒจํ„ด์— ๋”ฐ๋ผ ์ƒ˜ํ”Œ๋ง์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•์„ ์‚ฌ๋žŒ ๋ˆˆ์˜ ๊ตฌ์กฐ๋ฅผ ํ‰๋‚ด ๋‚ด์–ด ๋ฌผ์ฒด ์ถ”์  ์‹คํŒจ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐ ์ ์šฉํ•˜์˜€๋‹ค. ์‚ฌ๋žŒ ๋ˆˆ ๋ง๋ง‰ ์œ„์˜ ์‹œ์‹ ๊ฒฝ ์„ธํฌ(ganglion cells)์˜ ๋ถ„ํฌ๋ฅผ ๊ทผ์‚ฌํ™”ํ•œ log-polar ํŒจํ„ด ๊ตฌ์กฐ๋กœ ์ด๋ฏธ์ง€ ํ”ฝ์…€ ์ƒ˜ํ”Œ๋ง์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์‚ฌ๋žŒ ๋ˆˆ์˜ ์œ ์šฉํ•œ ํŠน์„ฑ์„ ํ‰๋‚ด ๋‚ด์—ˆ๋‹ค. Log-polar ํŒจํ„ด์œผ๋กœ ์ƒ˜ํ”Œ๋ง ๋œ ์ด๋ฏธ์ง€๋Š” ํšŒ์ „(rotation) ๋ณ€ํ™”์— ์˜ํ•œ ์˜ํ–ฅ์€ ๊ฐ์†Œ๋˜์–ด ๋‚˜ํƒ€๋‚˜๊ณ , ์ขŒ์šฐ๋‚˜ ์œ„์•„๋ž˜๋กœ์˜ ๋ณ‘์ง„(translation) ๋ณ€ํ™”๋Š” ์ฆํญ๋˜์–ด ๋‚˜ํƒ€๋‚˜๋Š” ํŠน์„ฑ์ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ์€ ํšŒ์ „์— ์˜ํ•ด ๋‚˜ํƒ€๋‚˜๋Š” ํฌ์ฆˆ ๋ณ€ํ™”๋“ค๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์ถ”์  ์‹คํŒจ์— ๋Œ€ํ•œ ๊ฑฐ์ง“ ๊ฒฝ๋ณด(false alarm)๋“ค์€ ์ค„์ด๊ณ , ๊ธ‰๊ฒฉํ•œ ์œ„์น˜ ๋ณ€ํ™”๋กœ ์ธํ•œ ์ถ”์  ์‹คํŒจ์— ๋Œ€ํ•œ ์ฐธ ๊ฒฝ๋ณด(true alarm)๋ฅผ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ log-polar ๊ตฌ์กฐ์˜ ํŠน์ง•์ธ ์ค‘์‹ฌ์™€(fovea) ์„ ๋ช…ํ™” ํŠน์„ฑ(predominant property)์€ ์ดˆ์ ์ด ๋งž์ถฐ์ง„ ์ค‘์‹ฌ ๋ถ€๋ถ„(์ถ”์  ๋ฌผ์ฒด์˜ ์ค‘์‹ฌ ๋ถ€๋ถ„)์˜ ์„ ๋ช…๋„๋Š” ์ฆ๊ฐ€์‹œํ‚ค๊ณ  ๊ทธ ์ด์™ธ์˜ ์ฃผ๋ณ€๋ถ€(์ถ”์  ๋ฌผ์ฒด ๋ฐ”๊นฅ ๋ถ€๋ถ„)๋Š” ํ๋ฆฟํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ ์ถ”์  ์‹คํŒจ์˜ ์ˆœ๊ฐ„์„ ์ •ํ™•ํ•˜๊ฒŒ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค€๋‹ค. ๋˜ํ•œ ๋ง๋ง‰ ์œ„์˜ ์‹œ์‹ ๊ฒฝ ์„ธํฌ ํ•˜๋‚˜ํ•˜๋‚˜๋Š” log-polar ๋ณ€ํ™˜ ์ด๋ฏธ์ง€์˜ ๊ฐ ํ”ฝ์…€์— ๋Œ€์‘์‹œ์ผœ, ๊ฐ ์„ธํฌ๊ฐ€ ๋น›์— ์ ์‘ํ•˜๋Š” ๋ฐฉ์‹๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ๊ฐ ํ”ฝ์…€์˜ ์ถ”์  ๋ฌผ์ฒด์˜ ์ƒ‰์ƒ์— ๋Œ€ํ•œ ์ ์‘์„ ๊ฐ€์šฐ์‹œ์•ˆ ํ˜ผํ•ฉ ๋ชจ๋ธ(Gaussian mixture model)์„ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ๋ง ํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์œผ๋กœ ์ œ์•ˆ๋œ ์ถ”์  ์‹คํŒจ ํƒ์ง€๋ฅผ ์œ„ํ•œ ๊ตฌ์กฐ์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง์˜ ์œ ์šฉ์„ฑ์€ ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๊ฒ€์ฆ๋˜์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๊ฒฝํ—˜์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง(empirical attentional sampling)์€ ์ด์ „์— ํš๋“๋œ ๊ฒฝํ—˜์  ์ง€์‹์„ ํ˜„์žฌ ๋‹จ๊ณ„ ์ƒ˜ํ”Œ๋ง์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ๊ฒฝํ—˜์  ์ง€์‹์€ ๊ฒฝํ—˜ ํ•™์Šต ๊ณผ์ •์„ ํ†ตํ•˜์—ฌ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ชจ๋ธ๋ง ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝํ—˜์  ์ƒ˜ํ”Œ๋ง ๊ฐœ๋…์€ ์›€์ง์ด๋Š” ๋ฌผ์ฒด ํƒ์ง€๋ฅผ ์œ„ํ•ด ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฐ๊ฒฝ ์ œ๊ฑฐ ๋ฐฉ๋ฒ•๋“ค์— ํ”ฝ์…€ ๋‹จ์œ„์˜ ์„ ํƒ์  ์—ฐ์‚ฐ ๋งˆ์Šคํฌ๋ฅผ ์ ์šฉํ•˜์—ฌ ์—ฐ์‚ฐ ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ์‹์œผ๋กœ ์ ์šฉ๋˜์—ˆ๋‹ค. ์ œ์•ˆ๋œ ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•์€ ์ „๊ฒฝ ์ง€์—ญ(foreground region)๊ณผ ๊ฐ™์ด ์ฃผ์˜์ง‘์ค‘์„ ํ•„์š”๋กœ ํ•˜๋Š” ์˜์—ญ์— ์ดˆ์ ์ด ๋งž์ถฐ์ ธ ์ƒ˜ํ”Œ๋ง์ด ์ง„ํ–‰๋˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ฃผ์˜์ง‘์ค‘ ์˜์—ญ์€ ์ „๊ฒฝ ํ™•๋ฅ  ์ง€๋„(foreground probability map)๋กœ ํ‘œํ˜„๋˜๊ณ , ์ด ํ™•๋ฅ  ์ง€๋„๋Š” ์ด์ „ ํ”„๋ ˆ์ž„์—์„œ์˜ ํƒ์ง€ ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•˜์—ฌ ์žฌ๊ท€์ (recursive) ํ™•๋ฅ  ์—…๋ฐ์ดํŠธ ๋ฐฉ์‹์œผ๋กœ ์ถ”์ •๋œ๋‹ค. ์ „๊ฒฝ ํ™•๋ฅ  ์ง€๋„๋Š” ์ „๊ฒฝ ๋ถ€๋ถ„์˜ ์‹œ๊ฐ„์ (temporal), ๊ณต๊ฐ„์ (spatial), ์ฃผํŒŒ์ˆ˜์ (frequency) ํŠน์„ฑ์„ ์ด์šฉํ•˜์—ฌ ์ƒ์„ฑ๋˜์—ˆ๋‹ค. ์ƒ์„ฑ๋œ ์ „๊ฒฝ ํ™•๋ฅ  ์ง€๋„๋ฅผ ์ด์šฉํ•˜์—ฌ, ๋ฌด์ž‘์œ„ ์ƒ˜ํ”Œ๋ง(randomly scattered sampling), ๊ณต๊ฐ„ ํ™•์žฅ ๋ฐฉ์‹์˜ ์ค‘์š” ์ƒ˜ํ”Œ๋ง(spatially expanding importance sampling), ๋†€๋žŒ ํ”ฝ์…€ ์ƒ˜ํ”Œ๋ง(surprise pixel sampling) ๋ฐฉ๋ฒ•๋“ค์ด ์ˆœ์ฐจ์ ์œผ๋กœ ์ง„ํ–‰๋˜๋ฉด์„œ ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง ๋งˆ์Šคํฌ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ฒฝํ—˜์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•์˜ ํšจ์œจ์„ฑ์€ ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๊ฒ€์ฆ๋˜์—ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด์˜ ํ”ฝ์…€ ๋‹จ์œ„์˜ ๋ฐฐ๊ฒฝ ์ œ๊ฑฐ ๋ฐฉ๋ฒ•์˜ ์—ฐ์‚ฐ ์†๋„๋ฅผ ํƒ์ง€ ์„ฑ๋Šฅ ์ €ํ•˜ ์—†์ด ์•ฝ 6.6๋ฐฐ ํ–ฅ์ƒ ์‹œ์ผฐ๋‹ค. ๋˜ํ•œ ๊ธฐ์กด์˜ ๋ฐฐ๊ฒฝ ์ œ๊ฑฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ full HD ์˜์ƒ(1920x1080)์—์„œ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์›€์ง์ด๋Š” ๋ฌผ์ฒด๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค. ์„ ํƒ์  ์ฃผ์˜์ง‘์ค‘ ์ƒ˜ํ”Œ๋ง(selective attentional sampling)์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์™€ ๋ชฉ์ ์— ๋Œ€ํ•œ ์‚ฌ์ „ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฌธ์ œ์˜ ํ•ด๊ฒฐ์„ ์œ„ํ•ด ๊ผญ ํ•„์š”๋กœ ํ•˜๋Š” ์ค‘์š” ๋ฐ์ดํ„ฐ๋งŒ ๋ฏธ๋ฆฌ ์„ ํƒํ•˜์—ฌ ๋ฌธ์ œ ํ•ด๊ฒฐ์˜ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๋ฐฉ์‹์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์„ ํƒ์  ์ƒ˜ํ”Œ๋ง ๋ฐฉ์‹์„ ์ด์šฉํ•˜์—ฌ ์ผ๋ฐ˜์ธ์ด ์ถ”๋Š” ์œ ๋ช… ๋Œ€์ค‘๊ฐ€์š”์˜ ์ถค์„ ์ธ์‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋Œ€์ค‘๊ฐ€์š” ์ถค์€ ์ผ๋ฐ˜์ ์œผ๋กœ, ๋ฐœ๋ ˆ๋‚˜ ๋ฆฌ๋“ฌ ์ฒด์กฐ์˜ ์ถค ๋™์ž‘๊ณผ๋Š” ๋‹ฌ๋ฆฌ ํ•˜๋‚˜ํ•˜๋‚˜๋ฅผ ๋”ฐ๋กœ ์ด๋ฆ„์„ ๋ถ™์ผ ์ˆ˜ ์—†๋Š” ์งง๊ณ  ๋ณต์žกํ•˜๋ฉฐ ๋‹ค์–‘ํ•œ ํ–‰๋™์˜ ์—ฐ์†์œผ๋กœ ๋‚˜ํƒ€๋‚œ๋‹ค. ํŠนํžˆ ์ถค์— ๋Œ€ํ•œ ์ผ์ •ํ•œ ์ œ์•ฝ์ด ์—†๋‹ค ๋ณด๋‹ˆ, ๋™์ž‘์˜ ์ •ํ™•์„ฑ ๋ณด๋‹ค๋Š” ์ถ”๋Š” ์‚ฌ๋žŒ์˜ ๊ฐœ์„ฑ๊ณผ ์ž์œ ๋กœ์›€์— ๋”ฐ๋ผ ๋™์ผํ•œ ์ถค๋„ ๋‹ค์–‘ํ•˜๊ฒŒ ํ‘œํ˜„์ด ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ํ–‰๋™์˜ ์ž์œ ๋กœ์›€๊ณผ ๋‹ค์–‘ํ•จ, ๊ทธ๋ฆฌ๊ณ  ์‹œ๊ฐ„์ ์œผ๋กœ ๊ธด ํ–‰๋™์˜ ๊ธธ์ด ๋•Œ๋ฌธ์— ๊ธฐ์กด์˜ ํ–‰๋™ ์ธ์‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ง์ ‘์ ์œผ๋กœ ์ ์šฉํ•  ์ˆ˜ ์—†๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ช…ํ™•ํ•˜๊ฒŒ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์—†์„ ์ •๋„๋กœ ์ž์œ ๋กœ์šด ํ–‰๋™์˜ ํ๋ฆ„ ํŠน์ง•์„ ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๊ณ  ์ธ์‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ์šด ํ–‰๋™ ํŠน์ง• ํ‘œํ˜„ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ , ์ด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋‚ฎ์€ ์ฐจ์› ๋ฐ์ดํ„ฐ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋˜ํ•œ ํšจ์œจ์ ์ธ ์ธ์‹์„ ์œ„ํ•ด ํŠน์ง•์ ์ธ ์‹œ๊ณต๊ฐ„์  ํ–‰๋™์˜ ๋ณ€ํ™” ์ง€์ ์„ ์ฃผ์˜์ง‘์ค‘์  ํ–‰๋™ ์ง€์ (attentional motion spot)๋ผ ๋ช…๋ช…ํ•˜๊ณ  ์ด๋ฅผ ์ž๋™์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ํŠน์ง• ์ ๋“ค์˜ ์‹œ๊ณต๊ฐ„์  ๋ถ„ํฌ๋ฅผ ํ˜ผํ•ฉ ๊ฐ€์šฐ์‹œ์•ˆ(Gaussian) ๋ถ„ํฌ๋กœ ๋ชจ๋ธ๋งํ•˜๊ณ , ์ด๋ ‡๊ฒŒ ํ‘œํ˜„๋œ ๋ชจ๋ธ๋ง ๋ฐฉ๋ฒ•์„ ํ–‰๋™ ์•…๋ณด(Action Chart)๋ผ๊ณ  ๋ช…๋ช…ํ•˜์˜€๋‹ค. ์ด ํ–‰๋™ ์•…๋ณด๋Š” ์‹œ๊ณต๊ฐ„์ ์ธ ํ–‰๋™์˜ ํ๋ฆ„์„ ์Œ์•… ์•…๋ณด์ฒ˜๋Ÿผ ์ค‘์š” ํ–‰๋™์˜ ์‹œ๊ฐ„์  ๋ฐœ์ƒ ์ง€์ ๊ณผ ์ข…๋ฅ˜, ์ง€์† ์‹œ๊ฐ„์„ ํ‘œํ˜„ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ‘œํ˜„๋œ ํ–‰๋™ ์•…๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ƒˆ๋กญ๊ฒŒ ์ œ์ž‘๋œ ๋Œ€์ค‘ ๊ฐ€์š” ์ถค ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ํšจ์œจ์ ์ด๊ณ  ํšจ๊ณผ์ ์œผ๋กœ ์ธ์‹ํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ๊ตฌ์„ฑํ•˜๋Š” ์„ธ๋ถ€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ•˜๋‚˜ํ•˜๋‚˜๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•˜์—ฌ ๊ฐ ๋ถ€๋ถ„์˜ ํ•„์š”์„ฑ์„ ๋ณด์˜€๊ณ , ํ˜„์žฌ ์กด์žฌํ•˜๋Š” ๊ธธ๊ณ  ๋ณต์žกํ•œ ํ–‰๋™์„ ์ธ์‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ง์ ‘ ๊ตฌํ˜„ํ•˜์—ฌ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์ธ์‹ ์„ฑ๋Šฅ๊ณผ ์—ฐ์‚ฐ ์‹œ๊ฐ„์ธก๋ฉด์—์„œ ์›”๋“ฑํžˆ ๋›ฐ์–ด๋‚จ์„ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋” ๋‚˜์•„๊ฐ€ ํ–‰๋™ ์•…๋ณด๋ฅผ ์ด์šฉํ•˜๋ฉด ๊ธด ์ถค ๋™์ž‘์„ ์‚ฌ๋žŒ์ด ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฑฐ์˜ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์œผ๋กœ ์š”์•ฝ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์˜€๋‹ค.In many practical computer vision scenarios it is possible to use information gleaned from the previous observations through the sampling process. In order to achieve a good performance with small computation, it is desirable that the samples cover the domain of target distribution with the small number of samples as possible via a concept of active or adaptive sampling. Based on the active sampling strategy, sampling could be concentrated on attentional portions, which can improve not only the sampling efficiency but also performances of algorithms. In this thesis, we define three different attentional sampling concepts, structured attentional sampling, empirical attentional sampling and selective attentional sampling. The proposed attentional sampling methods are successfully applied to computer vision problems, by achieving dramatic improvement in the sense of performance as well as computational load. The structured attentional sampling scheme uses an inherent structure to sample an interesting region densely instead of equally distributed sampling over the entire region. This sampling scheme is applied to a tracking failure detection method by imitating human visual system. In this scheme, we adopt a sampling structure based on Log-polar transformation simulating retina structure. Since the log-polar structure shows invariance against rotational changes and intensifies translational changes, it helps to reduce false alarms arising from rotational pose variations and increase true alarms in abrupt translational changes. In addition, foveal predominant property of log-polar structure helps to detect the tracking failing moment by amplifying the resolution around focus (tracking box center) and blurring the peripheries. Each ganglion cell corresponds to a pixel of log-polar image, and its adaptation is modeled as Gaussian mixture model. The validity of the structured attentional sampling method is illustrated through various experiments. The empirical attentional sampling scheme uses previously obtained empirical knowledge when sampling in current time. The empirical knowledge is modeled by a probability distribution function through an empirical learning process. This empirical sampling scheme is applied to mask generation to speed up conventional background subtraction algorithms for moving object detection. The proposed sampling strategy is designed to focus on attentional region such as foreground regions. The attentional region is estimated by using the detection results in the previous frame in a recursive probabilistic way. We generate a foreground probability map by using foreground properties of temporal, spatial, and frequency properties. Based on this foreground probability map, randomly scattered sampling, spatially expanding importance sampling and surprise pixel sampling are performed sequentially to make the attention sampling mask. The efficiency of the proposed empirical attention sampling method is shown through various experiments. The proposed masking method successfully speeds up pixel-wise background subtraction methods approximately 6.6 times without deteriorating detection performance. Also real-time detection with Full HD video is successfully achieved by various conventional background subtraction algorithms together with the proposed sampling scheme. The selective attentional sampling scheme does not use whole data but selects only important data enough to achieve a given classification objective. This selective sampling scheme is applied to the recognition of pop dances. Pop dances are action streams consisting of diverse actions which cannot be simply annotated. For such ``unannotatable'' action streams, conventional methods cannot be applied directly due to their complexity and longevity. In order to describe unannotatable action stream effectively, the proposed method employs a novel mid-level ``feature flow'' with low dimensional embedding. Also, for the purpose of recognition, ``attentional motion spots'' holding important information about the sequence are automatically selected. The feature values and the temporal locations of each attentional motion spot are modeled with Gaussian mixtures as ``Action Charts.'' The Action Chart describes the characteristics of an action stream in the spatio-temporal domain. Using the abstract information in the Action Charts, the proposed method efficiently recognizes pop dance sequences. In order to demonstrate the validity of the proposed method, we compare our method against the state-of-the-art methods with a newly built SNU Pop-Dance dataset containing long action streams composed of diverse actions.1 Introduction 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contents of Research . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Structured Attentional Sampling . . . . . . . . . . . . . . . 3 1.2.2 Empirical Attentional Sampling . . . . . . . . . . . . . . . . 5 1.2.3 Selective Attentional Sampling . . . . . . . . . . . . . . . . 6 2 Structured Attentional Sampling for Tracking Failure Detection 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Characteristics of Log-Polar Image and Tracking Failure . . . . . . 11 2.2.1 Properties of Log-Polar Image . . . . . . . . . . . . . . . . 11 2.2.2 Tracking Failure in Log-Polar Image . . . . . . . . . . . . . 12 2.3 Tracking Failure Detection Algorithm . . . . . . . . . . . . . . . . 14 2.3.1 Modeling of Ganglion Cell Adaptation . . . . . . . . . . . . 14 2.3.2 Initialization of GMM . . . . . . . . . . . . . . . . . . . . . 16 2.3.3 Tracking Failure Detection . . . . . . . . . . . . . . . . . . 16 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Eectiveness of Log-Polar Transformation and Initial Color Model Generation . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2 Combining with various tracking algorithms . . . . . . . . . 20 2.5 Final Remarks and Discussion . . . . . . . . . . . . . . . . . . . . . 23 3 Empirical Attentional Sampling for Speed-up of Background Sub- traction 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 Overall Scheme of Proposed Algorithm . . . . . . . . . . . . 28 3.3 Foreground Probability Map Generation . . . . . . . . . . . . . . . 31 3.3.1 Estimation of Foreground Properties . . . . . . . . . . . . . 31 3.3.2 Foreground Probability Map: PFG . . . . . . . . . . . . . . 32 3.4 Active Sampling Mask Generation . . . . . . . . . . . . . . . . . . 33 3.4.1 Randomly Scattered Sampling . . . . . . . . . . . . . . . . 33 3.4.2 Spatially Expanding Importance Sampling . . . . . . . . . . 36 3.4.3 Surprise Pixel Sampling Mask . . . . . . . . . . . . . . . . . 39 3.5 Computational Eciency Boundary . . . . . . . . . . . . . . . . . 39 3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6.1 Eciency of Active Attentional Sampling . . . . . . . . . . 42 3.6.2 Detection Performance Comparison . . . . . . . . . . . . . 43 3.6.3 Speed-up Performance Comparison . . . . . . . . . . . . . . 45 3.6.4 Real-time Detection in Full HD Video . . . . . . . . . . . . 49 3.7 Final Remarks and Discussion . . . . . . . . . . . . . . . . . . . . . 49 4 Selective Attentional Sampling for Recognition of Pop Dances 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Action Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.1 Motion Feature Flow (MFF) . . . . . . . . . . . . . . . . . 58 4.2.2 Hierarchical Low Dimensional Embedding . . . . . . . . . . 62 4.2.3 Attentional Motion Spot Selection . . . . . . . . . . . . . . 64 4.2.4 Action Chart Generation and Recognition . . . . . . . . . . 65 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.3.1 Pop-Dance Dataset . . . . . . . . . . . . . . . . . . . . . . . 67 4.3.2 Validation of Proposed Features . . . . . . . . . . . . . . . 71 4.3.3 Recognition Performance . . . . . . . . . . . . . . . . . . . 73 4.3.4 Automatic Action Abstraction . . . . . . . . . . . . . . . . 78 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5 Concluding Remarks 82 Bibliography 85 A Derivation of Computational Eciency Boundaries 95 A.1 Denition of Notations . . . . . . . . . . . . . . . . . . . . . . . . . 95 A.2 Derivative of the eciency boundary . . . . . . . . . . . . . . . . . 96 Abstract in Korean 102Docto

    A Fast Nearest Neighbor Search Algorithm by Nonlinear Embedding

    No full text
    We propose an efficient algorithm to find the exact nearest neighbor based on the Euclidean distance for largescale computer vision problems. We embed data points nonlinearly onto a low-dimensional space by simple computations and prove that the distance between two points in the embedded space is bounded by the distance in the original space. Instead of computing the distances in the high-dimensional original space to find the nearest neighbor, a lot of candidates are to be rejected based on the distances in the low-dimensional embedded space; due to this property, our algorithm is well-suited for high-dimensional and large-scale problems. We also show that our algorithm is improved further by partitioning input vectors recursively. Contrary to most of existing fast nearest neighbor search algorithms, our technique reports the exact nearest neighborโ€”not an approximate oneโ€”and requires a very simple preprocessing with no sophisticated data structures. We provide the theoretical analysis of our algorithm and evaluate its performance in synthetic and real data. 1
    corecore