174 research outputs found

    HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval

    Full text link
    Existing unsupervised deep product quantization methods primarily aim for the increased similarity between different views of the identical image, whereas the delicate multi-level semantic similarities preserved between images are overlooked. Moreover, these methods predominantly focus on the Euclidean space for computational convenience, compromising their ability to map the multi-level semantic relationships between images effectively. To mitigate these shortcomings, we propose a novel unsupervised product quantization method dubbed \textbf{Hi}erarchical \textbf{H}yperbolic \textbf{P}roduct \textbf{Q}uantization (HiHPQ), which learns quantized representations by incorporating hierarchical semantic similarity within hyperbolic geometry. Specifically, we propose a hyperbolic product quantizer, where the hyperbolic codebook attention mechanism and the quantized contrastive learning on the hyperbolic product manifold are introduced to expedite quantization. Furthermore, we propose a hierarchical semantics learning module, designed to enhance the distinction between similar and non-matching images for a query by utilizing the extracted hierarchical semantics as an additional training supervision. Experiments on benchmarks show that our proposed method outperforms state-of-the-art baselines.Comment: Accepted by AAAI 202

    Hyperbolic Hierarchical Contrastive Hashing

    Full text link
    Hierarchical semantic structures, naturally existing in real-world datasets, can assist in capturing the latent distribution of data to learn robust hash codes for retrieval systems. Although hierarchical semantic structures can be simply expressed by integrating semantically relevant data into a high-level taxon with coarser-grained semantics, the construction, embedding, and exploitation of the structures remain tricky for unsupervised hash learning. To tackle these problems, we propose a novel unsupervised hashing method named Hyperbolic Hierarchical Contrastive Hashing (HHCH). We propose to embed continuous hash codes into hyperbolic space for accurate semantic expression since embedding hierarchies in hyperbolic space generates less distortion than in hyper-sphere space and Euclidean space. In addition, we extend the K-Means algorithm to hyperbolic space and perform the proposed hierarchical hyperbolic K-Means algorithm to construct hierarchical semantic structures adaptively. To exploit the hierarchical semantic structures in hyperbolic space, we designed the hierarchical contrastive learning algorithm, including hierarchical instance-wise and hierarchical prototype-wise contrastive learning. Extensive experiments on four benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art unsupervised hashing methods. Codes will be released.Comment: 12 pages, 8 figure

    Unsupervised Hashing via Similarity Distribution Calibration

    Full text link
    Existing unsupervised hashing methods typically adopt a feature similarity preservation paradigm. As a result, they overlook the intrinsic similarity capacity discrepancy between the continuous feature and discrete hash code spaces. Specifically, since the feature similarity distribution is intrinsically biased (e.g., moderately positive similarity scores on negative pairs), the hash code similarities of positive and negative pairs often become inseparable (i.e., the similarity collapse problem). To solve this problem, in this paper a novel Similarity Distribution Calibration (SDC) method is introduced. Instead of matching individual pairwise similarity scores, SDC aligns the hash code similarity distribution towards a calibration distribution (e.g., beta distribution) with sufficient spread across the entire similarity capacity/range, to alleviate the similarity collapse problem. Extensive experiments show that our SDC outperforms the state-of-the-art alternatives on both coarse category-level and instance-level image retrieval tasks, often by a large margin. Code is available at https://github.com/kamwoh/sdc

    ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ํ•™์Šต ํ™˜๊ฒฝ ํ•˜์˜ ์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022.2. ์กฐ๋‚จ์ต.๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์งˆ์˜์— ๋Œ€ํ•œ ๊ด€๋ จ ์ด๋ฏธ์ง€๋ฅผ ์ฐพ๋Š” ์ฝ˜ํ…์ธ  ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์˜ ๊ทผ๋ณธ์ ์ธ ์ž‘์—… ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํŠนํžˆ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ํ•ด์‹ฑ (Hashing) ๋ฐ ๊ณฑ ์–‘์žํ™” (Product Quantization, PQ) ๋กœ ๋Œ€ํ‘œ๋˜๋Š” ๊ทผ์‚ฌ์ตœ๊ทผ์ ‘ ์ด์›ƒ (Approximate Nearest Neighbor, ANN) ๊ฒ€์ƒ‰ ๋ฐฉ์‹์ด ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๋”ฅ ๋Ÿฌ๋‹ (CNN-based deep learning) ์ด ๋งŽ์€ ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€ ์ดํ›„๋กœ, ํ•ด์‹ฑ ๋ฐ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ ๋ชจ๋‘ ๊ฐœ์„ ์„ ์œ„ํ•ด ๋”ฅ ๋Ÿฌ๋‹์„ ์ฑ„ํƒํ•˜๊ณ  ์žˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ ์ ˆํ•œ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ํ•™์Šต ํ™˜๊ฒฝ์•„๋ž˜์—์„œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๋ชฉ์ ์„ ๊ณ ๋ คํ•˜์—ฌ ์˜๋ฏธ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ํ•ด์‹ฑ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•œ ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ , ์˜๋ฏธ์ , ์‹œ๊ฐ์ ์œผ๋กœ ๋ชจ๋‘ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜์˜ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ์ค€์ง€๋„, ๋น„์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ, ๋ถ„๋ฅ˜ํ•ด์•ผํ•  ํด๋ž˜์Šค (class category) ๊ฐ€ ๋งŽ์€ ์–ผ๊ตด ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ ํ•˜๋‚˜ ์ด์ƒ์˜ ๋ ˆ์ด๋ธ” (label) ์ด ์ง€์ •๋œ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ์„ธํŠธ๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ๋”ฐ๋กœ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ๋จผ์ € ์ด๋ฏธ์ง€์— ๋ถ€์—ฌ๋œ ์˜๋ฏธ๋ก ์  ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜๋Š” ์ง€๋„ ํ•™์Šต์„ ๋„์ž…ํ•˜์—ฌ ํ•ด์‹ฑ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ํด๋ž˜์Šค ๊ฐ„ ์œ ์‚ฌ์„ฑ (๋‹ค๋ฅธ ์‚ฌ๋žŒ ์‚ฌ์ด์˜ ์œ ์‚ฌํ•œ ์™ธ๋ชจ) ๊ณผ ํด๋ž˜์Šค ๋‚ด ๋ณ€ํ™”(๊ฐ™์€ ์‚ฌ๋žŒ์˜ ๋‹ค๋ฅธ ํฌ์ฆˆ, ํ‘œ์ •, ์กฐ๋ช…) ์™€ ๊ฐ™์€ ์–ผ๊ตด ์ด๋ฏธ์ง€ ๊ตฌ๋ณ„์˜ ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ์ด๋ฏธ์ง€์˜ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•œ๋‹ค. ์–ผ๊ตด ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด SGH (Similarity Guided Hashing) ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ ๋‹ค์ค‘ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉํ•œ ์ž๊ธฐ ์œ ์‚ฌ์„ฑ ํ•™์Šต์ด ํ›ˆ๋ จ ์ค‘์— ์‚ฌ์šฉ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ด์‹ฑ ๊ธฐ๋ฐ˜์˜ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด DHD(Deep Hash Distillation) ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค. DHD์—์„œ๋Š” ์ง€๋„ ์‹ ํ˜ธ๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํด๋ž˜์Šค๋ณ„ ๋Œ€ํ‘œ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ํ•ด์‹œ ํ”„๋ก์‹œ (proxy) ๋ฅผ ๋„์ž…ํ•œ๋‹ค. ๋˜ํ•œ, ํ•ด์‹ฑ์— ์ ํ•ฉํ•œ ์ž์ฒด ์ฆ๋ฅ˜ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ์˜ ์ž ์žฌ๋ ฅ์„ ์ผ๋ฐ˜์ ์ธ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์ ์šฉํ•œ๋‹ค. ๋‘˜์งธ๋กœ, ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ํ™œ์šฉํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต์„ ์กฐ์‚ฌํ•˜์—ฌ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ์ง€๋„ ํ•™์Šต ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•๋“ค์€ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ ค๋ฉด ๊ฐ’๋น„์‹ผ ๋ ˆ์ด๋ธ” ์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•ด์•ผ ํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ์ˆ˜๋งŽ์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋Š” ํ›ˆ๋ จ์—์„œ ์ œ์™ธ๋œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฒกํ„ฐ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ๋ฐ˜์ง€๋„ ์˜์ƒ ๊ฒ€์ƒ‰ ๋ฐฉ์‹์ธ GPQ (Generalized Product Quantization) ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์˜๋ฏธ๋ก ์  ์œ ์‚ฌ์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฉ”ํŠธ๋ฆญ ํ•™์Šต (Metric learning) ์ „๋žต๊ณผ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ์˜ ๊ณ ์œ ํ•œ ์ž ์žฌ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๋Š” ์—”ํŠธ๋กœํ”ผ ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ฐœ์„ ํ•œ๋‹ค. ์ด ์†”๋ฃจ์…˜์€ ์–‘์žํ™” ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™” ์šฉ๋Ÿ‰์„ ์ฆ๊ฐ€์‹œ์ผœ ์ด์ „์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๊ฒŒํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์‚ฌ๋žŒ์˜ ์ง€๋„ ์—†์ด ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํƒ์ƒ‰ํ•œ๋‹ค. ๋น„๋ก ๋ ˆ์ด๋ธ” ์ฃผ์„์„ ํ™œ์šฉํ•œ ์‹ฌ์ธต ์ง€๋„ ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋“ค์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๋ณด์ผ์ง€๋ผ๋„, ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ •ํ™•ํ•˜๊ฒŒ ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•˜๋Š” ๊ฒƒ์€ ํž˜๋“ค๊ณ  ์ฃผ์„์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฝ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ ˆ์ด๋ธ” ์—†์ด ์ž์ฒด ์ง€๋„ ๋ฐฉ์‹์œผ๋กœ ํ›ˆ๋ จํ•˜๋Š” SPQ (Self-supervised Product Quantization) ๋„คํŠธ์›Œํฌ ๋ผ๋Š” ์‹ฌ์ธต ๋น„์ง€๋„ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ƒˆ๋กญ๊ฒŒ ์„ค๊ณ„๋œ ๊ต์ฐจ ์–‘์žํ™” ๋Œ€์กฐ ํ•™์Šต ๋ฐฉ์‹์œผ๋กœ ์„œ๋กœ ๋‹ค๋ฅด๊ฒŒ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๋น„๊ตํ•˜์—ฌ ๊ณฑ ์–‘์žํ™”์˜ ์ฝ”๋“œ์›Œ๋“œ์™€ ์‹ฌ์ธต ์‹œ๊ฐ์  ํ‘œํ˜„์„ ๋™์‹œ์— ํ•™์Šตํ•œ๋‹ค. ์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์— ๋‚ด์ œ๋œ ๋‚ด์šฉ์„ ๋ณ„๋„์˜ ์‚ฌ๋žŒ ์ง€๋„ ์—†์ด ๋„คํŠธ์›Œํฌ๊ฐ€ ์Šค์Šค๋กœ ์ดํ•ดํ•˜๊ฒŒ ๋˜๊ณ , ์‹œ๊ฐ์ ์œผ๋กœ ์ •ํ™•ํ•œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์„ค๋ช… ๊ธฐ๋Šฅ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•œ ํ‰๊ฐ€ ํ”„๋กœํ† ์ฝœ์—์„œ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ์‚ฐ์ถœํ•จ์„ ํ™•์ธํ–ˆ๋‹ค. ์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์–ผ๊ตด ์˜์ƒ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ SGH๋Š” ์ €ํ•ด์ƒ๋„ ๋ฐ ๊ณ ํ•ด์ƒ๋„ ์–ผ๊ตด ์˜์ƒ ๋ชจ๋‘์—์„œ ์ตœ๊ณ ์˜ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ , DHD๋Š” ์ตœ๊ณ ์˜ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋กœ ์ผ๋ฐ˜ ์˜์ƒ ๊ฒ€์ƒ‰ ์‹คํ—˜์—์„œ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ•œ๋‹ค. ์ค€์ง€๋„ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ GPQ๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœํ† ์ฝœ์— ๋Œ€ํ•œ ์ตœ์ƒ์˜ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋น„์ง€๋„ ํ•™์Šต ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ ์ง€๋„ ๋ฐฉ์‹์œผ๋กœ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ดˆ๊ธฐ ๊ฐ’ ์—†์ด๋„ SPQ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ƒ์˜ ๊ฒ€์ƒ‰ ์ ์ˆ˜๋ฅผ ์–ป์—ˆ์œผ๋ฉฐ ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๊ฐ€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ๊ฒ€์ƒ‰๋˜๋Š” ๊ฒƒ์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๋‹ค.Content-based image retrieval, which finds relevant images to a query from a huge database, is one of the fundamental tasks in the field of computer vision. Especially for conducting fast and accurate retrieval, Approximate Nearest Neighbor (ANN) search approaches represented by Hashing and Product Quantization (PQ) have been proposed to image retrieval community. Ever since neural network based deep learning has shown excellent performance in many computer vision tasks, both Hashing and product quantization-based image retrieval systems are also adopting deep learning for improvement. In this dissertation, image retrieval methods under various deep learning conditions are investigated to suggest the appropriate retrieval systems. Specifically, by considering the purpose of image retrieval, the supervised learning methods are proposed to develop the deep Hashing systems that retrieve semantically similar images, and the semi-supervised, unsupervised learning methods are proposed to establish the deep product quantization systems that retrieve both semantically and visually similar images. Moreover, by considering the characteristics of image retrieval database, the face image sets with numerous class categories, and the general image sets of one or more labeled images are separated to be explored when building a retrieval system. First, supervised learning with the semantic labels given to images is introduced to build a Hashing-based retrieval system. To address the difficulties of distinguishing face images, such as the inter-class similarities (similar appearance between different persons) and the intra-class variations (same person with different pose, facial expressions, illuminations), the identity label of each image is employed to derive the discriminative binary codes. To further develop the face image retrieval quality, Similarity Guided Hashing (SGH) scheme is proposed, where the self-similarity learning with multiple data augmentation results are employed during training. In terms of Hashing-based general image retrieval systems, Deep Hash Distillation (DHD) scheme is proposed, where the trainable hash proxy that presents class-wise representative is introduced to take advantage of supervised signals. Moreover, self-distillation scheme adapted for Hashing is utilized to improve general image retrieval performance by exploiting the potential of augmented data appropriately. Second, semi-supervised learning that utilizes both labeled and unlabeled image data is investigated to build a PQ-based retrieval system. Even if the supervised deep methods show excellent performance, they do not meet the expectations unless expensive label information is sufficient. Besides, there is a limitation that a tons of unlabeled image data is excluded from training. To resolve this issue, the vector quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network is proposed. A novel metric learning strategy that preserves semantic similarity between labeled data, and a entropy regularization term that fully exploits inherent potentials of unlabeled data are employed to improve the retrieval system. This solution increases the generalization capacity of the quantization network, which allows to overcome previous limitations. Lastly, to enable the network to perform a visually similar image retrieval on its own without any human supervision, unsupervised learning algorithm is explored. Although, deep supervised Hashing and PQ methods achieve the outstanding retrieval performances compared to the conventional methods by fully exploiting the label annotations, however, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, the deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner is proposed. A newly designed Cross Quantized Contrastive learning strategy is applied to jointly learn the PQ codewords and the deep visual representations by comparing individually transformed images (views). This allows to understand the image content and extract descriptive features so that the visually accurate retrieval can be performed. By conducting extensive image retrieval experiments on the benchmark datasets, the proposed methods are confirmed to yield the outstanding results under various evaluation protocols. For supervised face image retrieval, SGH achieves the best retrieval performance for both low and high resolution face image, and DHD also demonstrates its efficiency in general image retrieval experiments with the state-of-the-art retrieval performance. For semi-supervised general image retrieval, GPQ shows the best search results for protocols that use both labeled and unlabeled image data. Finally, for unsupervised general image retrieval, the best retrieval scores are achieved with SPQ even without supervised pre-training, and it can be observed that visually similar images are successfully retrieved as search results.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 1.1 Contribution 3 1.2 Contents 4 2 Supervised Learning for Deep Hashing: Similarity Guided Hashing for Face Image Retrieval / Deep Hash Distillation for General Image Retrieval 5 2.1 Motivation and Overview for Face Image Retrieval 5 2.1.1 Related Works 9 2.2 Similarity Guided Hashing 10 2.3 Experiments 16 2.3.1 Datasets and Setup 16 2.3.2 Results on Small Face Images 18 2.3.3 Results on Large Face Images 19 2.4 Motivation and Overview for General Image Retrieval 20 2.5 Related Works 22 2.6 Deep Hash Distillation 24 2.6.1 Self-distilled Hashing 24 2.6.2 Teacher loss 27 2.6.3 Training 29 2.6.4 Hamming Distance Analysis 29 2.7 Experiments 32 2.7.1 Setup 32 2.7.2 Implementation Details 32 2.7.3 Results 34 2.7.4 Analysis 37 3 Semi-supervised Learning for Product Quantization: Generalized Product Quantization Network for Semi-supervised Image Retrieval 42 3.1 Motivation and Overview 42 3.1.1 Related Work 45 3.2 Generalized Product Quantization 47 3.2.1 Semi-Supervised Learning 48 3.2.2 Retrieval 52 3.3 Experiments 53 3.3.1 Setup 53 3.3.2 Results and Analysis 55 4 Unsupervised Learning for Product Quantization: Self-supervised Product Quantization for Deep Unsupervised Image Retrieval 58 4.1 Motivation and Overview 58 4.1.1 Related Works 61 4.2 Self-supervised Product Quantization 62 4.2.1 Overall Framework 62 4.2.2 Self-supervised Training 64 4.3 Experiments 67 4.3.1 Datasets 67 4.3.2 Experimental Settings 68 4.3.3 Results 71 4.3.4 Empirical Analysis 71 5 Conclusion 75 Abstract (In Korean) 88๋ฐ•

    Information Retrieval: Recent Advances and Beyond

    Full text link
    In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain

    Torsion Graph Neural Networks

    Full text link
    Geometric deep learning (GDL) models have demonstrated a great potential for the analysis of non-Euclidian data. They are developed to incorporate the geometric and topological information of non-Euclidian data into the end-to-end deep learning architectures. Motivated by the recent success of discrete Ricci curvature in graph neural network (GNNs), we propose TorGNN, an analytic Torsion enhanced Graph Neural Network model. The essential idea is to characterize graph local structures with an analytic torsion based weight formula. Mathematically, analytic torsion is a topological invariant that can distinguish spaces which are homotopy equivalent but not homeomorphic. In our TorGNN, for each edge, a corresponding local simplicial complex is identified, then the analytic torsion (for this local simplicial complex) is calculated, and further used as a weight (for this edge) in message-passing process. Our TorGNN model is validated on link prediction tasks from sixteen different types of networks and node classification tasks from three types of networks. It has been found that our TorGNN can achieve superior performance on both tasks, and outperform various state-of-the-art models. This demonstrates that analytic torsion is a highly efficient topological invariant in the characterization of graph structures and can significantly boost the performance of GNNs

    Deep Image Retrieval: A Survey

    Get PDF
    In recent years a vast amount of visual content has been generated and shared from various fields, such as social media platforms, medical images, and robotics. This abundance of content creation and sharing has introduced new challenges. In particular, searching databases for similar content, i.e.content based image retrieval (CBIR), is a long-established research area, and more efficient and accurate methods are needed for real time retrieval. Artificial intelligence has made progress in CBIR and has significantly facilitated the process of intelligent search. In this survey we organize and review recent CBIR works that are developed based on deep learning algorithms and techniques, including insights and techniques from recent papers. We identify and present the commonly-used benchmarks and evaluation methods used in the field. We collect common challenges and propose promising future directions. More specifically, we focus on image retrieval with deep learning and organize the state of the art methods according to the types of deep network structure, deep features, feature enhancement methods, and network fine-tuning strategies. Our survey considers a wide variety of recent methods, aiming to promote a global view of the field of instance-based CBIR.Comment: 20 pages, 11 figure

    Exploring deep learning powered person re-identification

    Get PDF
    With increased security demands, more and more video surveillance systems are installed in public places, such as schools, stations, and shopping malls. Such large-scale monitoring requires 24/7 video analytics, which cannot be achieved purely by manual operations. Thanks to recent advances in artificial intelligence (AI), deep learning algorithms enable automatic video analytics via smart devices, which interpret people/vehicle behaviours in real time to avoid anomalies effectively. Among various video analytical tasks, people search is one of the most critical use cases due to its wide application scenarios, such as searching for missing people, detecting intruders, and tracking suspects. However, current AI-powered people search is generally built upon facial recognition technique, which is effective yet may be privacy-invaded. To address the problem, person re-identification (ReID), which aims to identify person-of-interest without facial information, has become an effective panacea. Despite considerable achievements in recent years, person ReID still faces some tough challenges, such as 1) the strong reliance on identity labels during feature learning, 2) the tradeoff between searching speed and identification accuracy, and 3) the huge modality discrepancy lying between data from different sources, e.g., RGB image and infrared (IR) image. Therefore, the research interest of this thesis is to focus on the above challenges in person ReID, analyze the advantages and limitations of existing solutions, and propose improved solutions for each challenge. Specifically, to alleviate the identity label reliance during feature learning, an improved unsupervised person ReID framework is proposed in Chapter 3, which refines not only imperfect cluster results but also the optimisation directions of samples. Based on the unsupervised setting, we further focus on the tradeoff between searching speed and identification accuracy. To this end, an improved unsupervised binary feature learning scheme for person ReID is proposed in Chapter 4, which derives binary identity representations that not only are robust to transformations but also have low bit correlations. Apart from person ReID conducted within a single modality where both query and gallery are RGB images, cross-modality retrieval is more challenging yet more common in real-world scenarios. To handle the problem, a two-stream framework, facilitating person ReID with on-the-fly keypoint-aware features, is proposed in Chapter 5. Furthermore, the thesis spots several promising research topics in Chapter 6, which are instructive for future works in person ReI
    • โ€ฆ
    corecore