76 research outputs found

    ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ํ•™์Šต ํ™˜๊ฒฝ ํ•˜์˜ ์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022.2. ์กฐ๋‚จ์ต.๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์งˆ์˜์— ๋Œ€ํ•œ ๊ด€๋ จ ์ด๋ฏธ์ง€๋ฅผ ์ฐพ๋Š” ์ฝ˜ํ…์ธ  ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์˜ ๊ทผ๋ณธ์ ์ธ ์ž‘์—… ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํŠนํžˆ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ํ•ด์‹ฑ (Hashing) ๋ฐ ๊ณฑ ์–‘์žํ™” (Product Quantization, PQ) ๋กœ ๋Œ€ํ‘œ๋˜๋Š” ๊ทผ์‚ฌ์ตœ๊ทผ์ ‘ ์ด์›ƒ (Approximate Nearest Neighbor, ANN) ๊ฒ€์ƒ‰ ๋ฐฉ์‹์ด ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๋”ฅ ๋Ÿฌ๋‹ (CNN-based deep learning) ์ด ๋งŽ์€ ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€ ์ดํ›„๋กœ, ํ•ด์‹ฑ ๋ฐ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ ๋ชจ๋‘ ๊ฐœ์„ ์„ ์œ„ํ•ด ๋”ฅ ๋Ÿฌ๋‹์„ ์ฑ„ํƒํ•˜๊ณ  ์žˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ ์ ˆํ•œ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ํ•™์Šต ํ™˜๊ฒฝ์•„๋ž˜์—์„œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๋ชฉ์ ์„ ๊ณ ๋ คํ•˜์—ฌ ์˜๋ฏธ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ํ•ด์‹ฑ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•œ ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ , ์˜๋ฏธ์ , ์‹œ๊ฐ์ ์œผ๋กœ ๋ชจ๋‘ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜์˜ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ์ค€์ง€๋„, ๋น„์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ, ๋ถ„๋ฅ˜ํ•ด์•ผํ•  ํด๋ž˜์Šค (class category) ๊ฐ€ ๋งŽ์€ ์–ผ๊ตด ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ ํ•˜๋‚˜ ์ด์ƒ์˜ ๋ ˆ์ด๋ธ” (label) ์ด ์ง€์ •๋œ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ์„ธํŠธ๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ๋”ฐ๋กœ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ๋จผ์ € ์ด๋ฏธ์ง€์— ๋ถ€์—ฌ๋œ ์˜๋ฏธ๋ก ์  ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜๋Š” ์ง€๋„ ํ•™์Šต์„ ๋„์ž…ํ•˜์—ฌ ํ•ด์‹ฑ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ํด๋ž˜์Šค ๊ฐ„ ์œ ์‚ฌ์„ฑ (๋‹ค๋ฅธ ์‚ฌ๋žŒ ์‚ฌ์ด์˜ ์œ ์‚ฌํ•œ ์™ธ๋ชจ) ๊ณผ ํด๋ž˜์Šค ๋‚ด ๋ณ€ํ™”(๊ฐ™์€ ์‚ฌ๋žŒ์˜ ๋‹ค๋ฅธ ํฌ์ฆˆ, ํ‘œ์ •, ์กฐ๋ช…) ์™€ ๊ฐ™์€ ์–ผ๊ตด ์ด๋ฏธ์ง€ ๊ตฌ๋ณ„์˜ ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ์ด๋ฏธ์ง€์˜ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•œ๋‹ค. ์–ผ๊ตด ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด SGH (Similarity Guided Hashing) ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ ๋‹ค์ค‘ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉํ•œ ์ž๊ธฐ ์œ ์‚ฌ์„ฑ ํ•™์Šต์ด ํ›ˆ๋ จ ์ค‘์— ์‚ฌ์šฉ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ด์‹ฑ ๊ธฐ๋ฐ˜์˜ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด DHD(Deep Hash Distillation) ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค. DHD์—์„œ๋Š” ์ง€๋„ ์‹ ํ˜ธ๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํด๋ž˜์Šค๋ณ„ ๋Œ€ํ‘œ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ํ•ด์‹œ ํ”„๋ก์‹œ (proxy) ๋ฅผ ๋„์ž…ํ•œ๋‹ค. ๋˜ํ•œ, ํ•ด์‹ฑ์— ์ ํ•ฉํ•œ ์ž์ฒด ์ฆ๋ฅ˜ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ์˜ ์ž ์žฌ๋ ฅ์„ ์ผ๋ฐ˜์ ์ธ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์ ์šฉํ•œ๋‹ค. ๋‘˜์งธ๋กœ, ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ํ™œ์šฉํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต์„ ์กฐ์‚ฌํ•˜์—ฌ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ์ง€๋„ ํ•™์Šต ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•๋“ค์€ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ ค๋ฉด ๊ฐ’๋น„์‹ผ ๋ ˆ์ด๋ธ” ์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•ด์•ผ ํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ์ˆ˜๋งŽ์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋Š” ํ›ˆ๋ จ์—์„œ ์ œ์™ธ๋œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฒกํ„ฐ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ๋ฐ˜์ง€๋„ ์˜์ƒ ๊ฒ€์ƒ‰ ๋ฐฉ์‹์ธ GPQ (Generalized Product Quantization) ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์˜๋ฏธ๋ก ์  ์œ ์‚ฌ์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฉ”ํŠธ๋ฆญ ํ•™์Šต (Metric learning) ์ „๋žต๊ณผ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ์˜ ๊ณ ์œ ํ•œ ์ž ์žฌ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๋Š” ์—”ํŠธ๋กœํ”ผ ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ฐœ์„ ํ•œ๋‹ค. ์ด ์†”๋ฃจ์…˜์€ ์–‘์žํ™” ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™” ์šฉ๋Ÿ‰์„ ์ฆ๊ฐ€์‹œ์ผœ ์ด์ „์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๊ฒŒํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์‚ฌ๋žŒ์˜ ์ง€๋„ ์—†์ด ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํƒ์ƒ‰ํ•œ๋‹ค. ๋น„๋ก ๋ ˆ์ด๋ธ” ์ฃผ์„์„ ํ™œ์šฉํ•œ ์‹ฌ์ธต ์ง€๋„ ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋“ค์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๋ณด์ผ์ง€๋ผ๋„, ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ •ํ™•ํ•˜๊ฒŒ ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•˜๋Š” ๊ฒƒ์€ ํž˜๋“ค๊ณ  ์ฃผ์„์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฝ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ ˆ์ด๋ธ” ์—†์ด ์ž์ฒด ์ง€๋„ ๋ฐฉ์‹์œผ๋กœ ํ›ˆ๋ จํ•˜๋Š” SPQ (Self-supervised Product Quantization) ๋„คํŠธ์›Œํฌ ๋ผ๋Š” ์‹ฌ์ธต ๋น„์ง€๋„ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ƒˆ๋กญ๊ฒŒ ์„ค๊ณ„๋œ ๊ต์ฐจ ์–‘์žํ™” ๋Œ€์กฐ ํ•™์Šต ๋ฐฉ์‹์œผ๋กœ ์„œ๋กœ ๋‹ค๋ฅด๊ฒŒ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๋น„๊ตํ•˜์—ฌ ๊ณฑ ์–‘์žํ™”์˜ ์ฝ”๋“œ์›Œ๋“œ์™€ ์‹ฌ์ธต ์‹œ๊ฐ์  ํ‘œํ˜„์„ ๋™์‹œ์— ํ•™์Šตํ•œ๋‹ค. ์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์— ๋‚ด์ œ๋œ ๋‚ด์šฉ์„ ๋ณ„๋„์˜ ์‚ฌ๋žŒ ์ง€๋„ ์—†์ด ๋„คํŠธ์›Œํฌ๊ฐ€ ์Šค์Šค๋กœ ์ดํ•ดํ•˜๊ฒŒ ๋˜๊ณ , ์‹œ๊ฐ์ ์œผ๋กœ ์ •ํ™•ํ•œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์„ค๋ช… ๊ธฐ๋Šฅ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•œ ํ‰๊ฐ€ ํ”„๋กœํ† ์ฝœ์—์„œ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ์‚ฐ์ถœํ•จ์„ ํ™•์ธํ–ˆ๋‹ค. ์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์–ผ๊ตด ์˜์ƒ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ SGH๋Š” ์ €ํ•ด์ƒ๋„ ๋ฐ ๊ณ ํ•ด์ƒ๋„ ์–ผ๊ตด ์˜์ƒ ๋ชจ๋‘์—์„œ ์ตœ๊ณ ์˜ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ , DHD๋Š” ์ตœ๊ณ ์˜ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋กœ ์ผ๋ฐ˜ ์˜์ƒ ๊ฒ€์ƒ‰ ์‹คํ—˜์—์„œ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ•œ๋‹ค. ์ค€์ง€๋„ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ GPQ๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœํ† ์ฝœ์— ๋Œ€ํ•œ ์ตœ์ƒ์˜ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋น„์ง€๋„ ํ•™์Šต ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ ์ง€๋„ ๋ฐฉ์‹์œผ๋กœ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ดˆ๊ธฐ ๊ฐ’ ์—†์ด๋„ SPQ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ƒ์˜ ๊ฒ€์ƒ‰ ์ ์ˆ˜๋ฅผ ์–ป์—ˆ์œผ๋ฉฐ ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๊ฐ€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ๊ฒ€์ƒ‰๋˜๋Š” ๊ฒƒ์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๋‹ค.Content-based image retrieval, which finds relevant images to a query from a huge database, is one of the fundamental tasks in the field of computer vision. Especially for conducting fast and accurate retrieval, Approximate Nearest Neighbor (ANN) search approaches represented by Hashing and Product Quantization (PQ) have been proposed to image retrieval community. Ever since neural network based deep learning has shown excellent performance in many computer vision tasks, both Hashing and product quantization-based image retrieval systems are also adopting deep learning for improvement. In this dissertation, image retrieval methods under various deep learning conditions are investigated to suggest the appropriate retrieval systems. Specifically, by considering the purpose of image retrieval, the supervised learning methods are proposed to develop the deep Hashing systems that retrieve semantically similar images, and the semi-supervised, unsupervised learning methods are proposed to establish the deep product quantization systems that retrieve both semantically and visually similar images. Moreover, by considering the characteristics of image retrieval database, the face image sets with numerous class categories, and the general image sets of one or more labeled images are separated to be explored when building a retrieval system. First, supervised learning with the semantic labels given to images is introduced to build a Hashing-based retrieval system. To address the difficulties of distinguishing face images, such as the inter-class similarities (similar appearance between different persons) and the intra-class variations (same person with different pose, facial expressions, illuminations), the identity label of each image is employed to derive the discriminative binary codes. To further develop the face image retrieval quality, Similarity Guided Hashing (SGH) scheme is proposed, where the self-similarity learning with multiple data augmentation results are employed during training. In terms of Hashing-based general image retrieval systems, Deep Hash Distillation (DHD) scheme is proposed, where the trainable hash proxy that presents class-wise representative is introduced to take advantage of supervised signals. Moreover, self-distillation scheme adapted for Hashing is utilized to improve general image retrieval performance by exploiting the potential of augmented data appropriately. Second, semi-supervised learning that utilizes both labeled and unlabeled image data is investigated to build a PQ-based retrieval system. Even if the supervised deep methods show excellent performance, they do not meet the expectations unless expensive label information is sufficient. Besides, there is a limitation that a tons of unlabeled image data is excluded from training. To resolve this issue, the vector quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network is proposed. A novel metric learning strategy that preserves semantic similarity between labeled data, and a entropy regularization term that fully exploits inherent potentials of unlabeled data are employed to improve the retrieval system. This solution increases the generalization capacity of the quantization network, which allows to overcome previous limitations. Lastly, to enable the network to perform a visually similar image retrieval on its own without any human supervision, unsupervised learning algorithm is explored. Although, deep supervised Hashing and PQ methods achieve the outstanding retrieval performances compared to the conventional methods by fully exploiting the label annotations, however, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, the deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner is proposed. A newly designed Cross Quantized Contrastive learning strategy is applied to jointly learn the PQ codewords and the deep visual representations by comparing individually transformed images (views). This allows to understand the image content and extract descriptive features so that the visually accurate retrieval can be performed. By conducting extensive image retrieval experiments on the benchmark datasets, the proposed methods are confirmed to yield the outstanding results under various evaluation protocols. For supervised face image retrieval, SGH achieves the best retrieval performance for both low and high resolution face image, and DHD also demonstrates its efficiency in general image retrieval experiments with the state-of-the-art retrieval performance. For semi-supervised general image retrieval, GPQ shows the best search results for protocols that use both labeled and unlabeled image data. Finally, for unsupervised general image retrieval, the best retrieval scores are achieved with SPQ even without supervised pre-training, and it can be observed that visually similar images are successfully retrieved as search results.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 1.1 Contribution 3 1.2 Contents 4 2 Supervised Learning for Deep Hashing: Similarity Guided Hashing for Face Image Retrieval / Deep Hash Distillation for General Image Retrieval 5 2.1 Motivation and Overview for Face Image Retrieval 5 2.1.1 Related Works 9 2.2 Similarity Guided Hashing 10 2.3 Experiments 16 2.3.1 Datasets and Setup 16 2.3.2 Results on Small Face Images 18 2.3.3 Results on Large Face Images 19 2.4 Motivation and Overview for General Image Retrieval 20 2.5 Related Works 22 2.6 Deep Hash Distillation 24 2.6.1 Self-distilled Hashing 24 2.6.2 Teacher loss 27 2.6.3 Training 29 2.6.4 Hamming Distance Analysis 29 2.7 Experiments 32 2.7.1 Setup 32 2.7.2 Implementation Details 32 2.7.3 Results 34 2.7.4 Analysis 37 3 Semi-supervised Learning for Product Quantization: Generalized Product Quantization Network for Semi-supervised Image Retrieval 42 3.1 Motivation and Overview 42 3.1.1 Related Work 45 3.2 Generalized Product Quantization 47 3.2.1 Semi-Supervised Learning 48 3.2.2 Retrieval 52 3.3 Experiments 53 3.3.1 Setup 53 3.3.2 Results and Analysis 55 4 Unsupervised Learning for Product Quantization: Self-supervised Product Quantization for Deep Unsupervised Image Retrieval 58 4.1 Motivation and Overview 58 4.1.1 Related Works 61 4.2 Self-supervised Product Quantization 62 4.2.1 Overall Framework 62 4.2.2 Self-supervised Training 64 4.3 Experiments 67 4.3.1 Datasets 67 4.3.2 Experimental Settings 68 4.3.3 Results 71 4.3.4 Empirical Analysis 71 5 Conclusion 75 Abstract (In Korean) 88๋ฐ•

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Distributing Recognition in Computational Paralinguistics

    Get PDF

    Complex queries and complex data

    Get PDF
    With the widespread availability of wearable computers, equipped with sensors such as GPS or cameras, and with the ubiquitous presence of micro-blogging platforms, social media sites and digital marketplaces, data can be collected and shared on a massive scale. A necessary building block for taking advantage from this vast amount of information are efficient and effective similarity search algorithms that are able to find objects in a database which are similar to a query object. Due to the general applicability of similarity search over different data types and applications, the formalization of this concept and the development of strategies for evaluating similarity queries has evolved to an important field of research in the database community, spatio-temporal database community, and others, such as information retrieval and computer vision. This thesis concentrates on a special instance of similarity queries, namely k-Nearest Neighbor (kNN) Queries and their close relative, Reverse k-Nearest Neighbor (RkNN) Queries. As a first contribution we provide an in-depth analysis of the RkNN join. While the problem of reverse nearest neighbor queries has received a vast amount of research interest, the problem of performing such queries in a bulk has not seen an in-depth analysis so far. We first formalize the RkNN join, identifying its monochromatic and bichromatic versions and their self-join variants. After pinpointing the monochromatic RkNN join as an important and interesting instance, we develop solutions for this class, including a self-pruning and a mutual pruning algorithm. We then evaluate these algorithms extensively on a variety of synthetic and real datasets. From this starting point of similarity queries on certain data we shift our focus to uncertain data, addressing nearest neighbor queries in uncertain spatio-temporal databases. Starting from the traditional definition of nearest neighbor queries and a data model for uncertain spatio-temporal data, we develop efficient query mechanisms that consider temporal dependencies during query evaluation. We define intuitive query semantics, aiming not only at returning the objects closest to the query but also their probability of being a nearest neighbor. After theoretically evaluating these query predicates we develop efficient querying algorithms for the proposed query predicates. Given the findings of this research on nearest neighbor queries, we extend these results to reverse nearest neighbor queries. Finally we address the problem of querying large datasets containing set-based objects, namely image databases, where images are represented by (multi-)sets of vectors and additional metadata describing the position of features in the image. We aim at reducing the number of kNN queries performed during query processing and evaluate a modified pipeline that aims at optimizing the query accuracy at a small number of kNN queries. Additionally, as feature representations in object recognition are moving more and more from the real-valued domain to the binary domain, we evaluate efficient indexing techniques for binary feature vectors.Nicht nur durch die Verbreitung von tragbaren Computern, die mit einer Vielzahl von Sensoren wie GPS oder Kameras ausgestattet sind, sondern auch durch die breite Nutzung von Microblogging-Plattformen, Social-Media Websites und digitale Marktplรคtze wie Amazon und Ebay wird durch die User eine gigantische Menge an Daten verรถffentlicht. Um aus diesen Daten einen Mehrwert erzeugen zu kรถnnen bedarf es effizienter und effektiver Algorithmen zur ร„hnlichkeitssuche, die zu einem gegebenen Anfrageobjekt รคhnliche Objekte in einer Datenbank identifiziert. Durch die Allgemeinheit dieses Konzeptes der ร„hnlichkeit รผber unterschiedliche Datentypen und Anwendungen hinweg hat sich die ร„hnlichkeitssuche zu einem wichtigen Forschungsfeld, nicht nur im Datenbankumfeld oder im Bereich raum-zeitlicher Datenbanken, sondern auch in anderen Forschungsgebieten wie dem Information Retrieval oder dem Maschinellen Sehen entwickelt. In der vorliegenden Arbeit beschรคftigen wir uns mit einem speziellen Anfrageprรคdikat im Bereich der ร„hnlichkeitsanfragen, mit k-nรคchste Nachbarn (kNN) Anfragen und ihrem Verwandten, den Revers k-nรคchsten Nachbarn (RkNN) Anfragen. In einem ersten Beitrag analysieren wir den RkNN Join. Obwohl das Problem von reverse nรคchsten Nachbar Anfragen in den letzten Jahren eine breite Aufmerksamkeit in der Forschungsgemeinschaft erfahren hat, wurde das Problem eine Menge von RkNN Anfragen gleichzeitig auszufรผhren nicht ausreichend analysiert. Aus diesem Grund formalisieren wir das Problem des RkNN Joins mit seinen monochromatischen und bichromatischen Varianten. Wir identifizieren den monochromatischen RkNN Join als einen wichtigen und interessanten Fall und entwickeln entsprechende Anfragealgorithmen. In einer detaillierten Evaluation vergleichen wir die ausgearbeiteten Verfahren auf einer Vielzahl von synthetischen und realen Datensรคtzen. Nach diesem Kapitel รผber ร„hnlichkeitssuche auf sicheren Daten konzentrieren wir uns auf unsichere Daten, speziell im Bereich raum-zeitlicher Datenbanken. Ausgehend von der traditionellen Definition von Nachbarschaftsanfragen und einem Datenmodell fรผr unsichere raum-zeitliche Daten entwickeln wir effiziente Anfrageverfahren, die zeitliche Abhรคngigkeiten bei der Anfragebearbeitung beachten. Zu diesem Zweck definieren wir Anfrageprรคdikate die nicht nur die Objekte zurรผckzugeben, die dem Anfrageobjekt am nรคchsten sind, sondern auch die Wahrscheinlichkeit mit der sie ein nรคchster Nachbar sind. Wir evaluieren die definierten Anfrageprรคdikate theoretisch und entwickeln effiziente Anfragestrategien, die eine Anfragebearbeitung zu vertretbaren Laufzeiten gewรคhrleisten. Ausgehend von den Ergebnissen fรผr Nachbarschaftsanfragen erweitern wir unsere Ergebnisse auf Reverse Nachbarschaftsanfragen. Zuletzt behandeln wir das Problem der Anfragebearbeitung bei Mengen-basierten Objekten, die zum Beispiel in Bilddatenbanken Verwendung finden: Oft werden Bilder durch eine Menge von Merkmalsvektoren und zusรคtzliche Metadaten (zum Beispiel die Position der Merkmale im Bild) dargestellt. Wir evaluieren eine modifizierte Pipeline, die darauf abzielt, die Anfragegenauigkeit bei einer kleinen Anzahl an kNN-Anfragen zu maximieren. Da reellwertige Merkmalsvektoren im Bereich der Objekterkennung immer รถfter durch Bitvektoren ersetzt werden, die sich durch einen geringeren Speicherplatzbedarf und hรถhere Laufzeiteffizienz auszeichnen, evaluieren wir auรŸerdem Indexierungsverfahren fรผr Binรคrvektoren
    • โ€ฆ
    corecore