174 research outputs found
HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval
Existing unsupervised deep product quantization methods primarily aim for the
increased similarity between different views of the identical image, whereas
the delicate multi-level semantic similarities preserved between images are
overlooked. Moreover, these methods predominantly focus on the Euclidean space
for computational convenience, compromising their ability to map the
multi-level semantic relationships between images effectively. To mitigate
these shortcomings, we propose a novel unsupervised product quantization method
dubbed \textbf{Hi}erarchical \textbf{H}yperbolic \textbf{P}roduct
\textbf{Q}uantization (HiHPQ), which learns quantized representations by
incorporating hierarchical semantic similarity within hyperbolic geometry.
Specifically, we propose a hyperbolic product quantizer, where the hyperbolic
codebook attention mechanism and the quantized contrastive learning on the
hyperbolic product manifold are introduced to expedite quantization.
Furthermore, we propose a hierarchical semantics learning module, designed to
enhance the distinction between similar and non-matching images for a query by
utilizing the extracted hierarchical semantics as an additional training
supervision. Experiments on benchmarks show that our proposed method
outperforms state-of-the-art baselines.Comment: Accepted by AAAI 202
Hyperbolic Hierarchical Contrastive Hashing
Hierarchical semantic structures, naturally existing in real-world datasets,
can assist in capturing the latent distribution of data to learn robust hash
codes for retrieval systems. Although hierarchical semantic structures can be
simply expressed by integrating semantically relevant data into a high-level
taxon with coarser-grained semantics, the construction, embedding, and
exploitation of the structures remain tricky for unsupervised hash learning. To
tackle these problems, we propose a novel unsupervised hashing method named
Hyperbolic Hierarchical Contrastive Hashing (HHCH). We propose to embed
continuous hash codes into hyperbolic space for accurate semantic expression
since embedding hierarchies in hyperbolic space generates less distortion than
in hyper-sphere space and Euclidean space. In addition, we extend the K-Means
algorithm to hyperbolic space and perform the proposed hierarchical hyperbolic
K-Means algorithm to construct hierarchical semantic structures adaptively. To
exploit the hierarchical semantic structures in hyperbolic space, we designed
the hierarchical contrastive learning algorithm, including hierarchical
instance-wise and hierarchical prototype-wise contrastive learning. Extensive
experiments on four benchmark datasets demonstrate that the proposed method
outperforms the state-of-the-art unsupervised hashing methods. Codes will be
released.Comment: 12 pages, 8 figure
Unsupervised Hashing via Similarity Distribution Calibration
Existing unsupervised hashing methods typically adopt a feature similarity
preservation paradigm. As a result, they overlook the intrinsic similarity
capacity discrepancy between the continuous feature and discrete hash code
spaces. Specifically, since the feature similarity distribution is
intrinsically biased (e.g., moderately positive similarity scores on negative
pairs), the hash code similarities of positive and negative pairs often become
inseparable (i.e., the similarity collapse problem). To solve this problem, in
this paper a novel Similarity Distribution Calibration (SDC) method is
introduced. Instead of matching individual pairwise similarity scores, SDC
aligns the hash code similarity distribution towards a calibration distribution
(e.g., beta distribution) with sufficient spread across the entire similarity
capacity/range, to alleviate the similarity collapse problem. Extensive
experiments show that our SDC outperforms the state-of-the-art alternatives on
both coarse category-level and instance-level image retrieval tasks, often by a
large margin. Code is available at https://github.com/kamwoh/sdc
๋ค์ํ ๋ฅ ๋ฌ๋ ํ์ต ํ๊ฒฝ ํ์ ์ปจํ ์ธ ๊ธฐ๋ฐ ์ด๋ฏธ์ง ๊ฒ์
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2022.2. ์กฐ๋จ์ต.๋ฐฉ๋ํ ๋ฐ์ดํฐ๋ฒ ์ด์ค์์ ์ง์์ ๋ํ ๊ด๋ จ ์ด๋ฏธ์ง๋ฅผ ์ฐพ๋ ์ฝํ
์ธ ๊ธฐ๋ฐ ์ด๋ฏธ์ง ๊ฒ์์ ์ปดํจํฐ ๋น์ ๋ถ์ผ์ ๊ทผ๋ณธ์ ์ธ ์์
์ค ํ๋์ด๋ค. ํนํ ๋น ๋ฅด๊ณ ์ ํํ ๊ฒ์์ ์ํํ๊ธฐ ์ํด ํด์ฑ (Hashing) ๋ฐ ๊ณฑ ์์ํ (Product Quantization, PQ) ๋ก ๋ํ๋๋ ๊ทผ์ฌ์ต๊ทผ์ ์ด์ (Approximate Nearest Neighbor, ANN) ๊ฒ์ ๋ฐฉ์์ด ์ด๋ฏธ์ง ๊ฒ์ ์ปค๋ฎค๋ํฐ์์ ์ฃผ๋ชฉ๋ฐ๊ณ ์๋ค. ์ ๊ฒฝ๋ง ๊ธฐ๋ฐ ๋ฅ ๋ฌ๋ (CNN-based deep learning) ์ด ๋ง์ ์ปดํจํฐ ๋น์ ์์
์์ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์ฌ์ค ์ดํ๋ก, ํด์ฑ ๋ฐ ๊ณฑ ์์ํ ๊ธฐ๋ฐ ์ด๋ฏธ์ง ๊ฒ์ ์์คํ
๋ชจ๋ ๊ฐ์ ์ ์ํด ๋ฅ ๋ฌ๋์ ์ฑํํ๊ณ ์๋ค. ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ์ ์ ํ ๊ฒ์ ์์คํ
์ ์ ์ํ๊ธฐ ์ํด ๋ค์ํ ๋ฅ ๋ฌ๋ ํ์ต ํ๊ฒฝ์๋์์ ์ด๋ฏธ์ง ๊ฒ์ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ์ด๋ฏธ์ง ๊ฒ์์ ๋ชฉ์ ์ ๊ณ ๋ คํ์ฌ ์๋ฏธ์ ์ผ๋ก ์ ์ฌํ ์ด๋ฏธ์ง๋ฅผ ๊ฒ์ํ๋ ๋ฅ ๋ฌ๋ ํด์ฑ ์์คํ
์ ๊ฐ๋ฐํ๊ธฐ ์ํ ์ง๋ ํ์ต ๋ฐฉ๋ฒ์ ์ ์ํ๊ณ , ์๋ฏธ์ , ์๊ฐ์ ์ผ๋ก ๋ชจ๋ ์ ์ฌํ ์ด๋ฏธ์ง๋ฅผ ๊ฒ์ํ๋ ๋ฅ ๋ฌ๋ ๊ณฑ ์์ํ ๊ธฐ๋ฐ์ ์์คํ
์ ๊ตฌ์ถํ๊ธฐ ์ํ ์ค์ง๋, ๋น์ง๋ ํ์ต ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋ํ, ์ด๋ฏธ์ง ๊ฒ์ ๋ฐ์ดํฐ๋ฒ ์ด์ค์ ํน์ฑ์ ๊ณ ๋ คํ์ฌ, ๋ถ๋ฅํด์ผํ ํด๋์ค (class category) ๊ฐ ๋ง์ ์ผ๊ตด ์ด๋ฏธ์ง ๋ฐ์ดํฐ ์ธํธ์ ํ๋ ์ด์์ ๋ ์ด๋ธ (label) ์ด ์ง์ ๋ ์ผ๋ฐ ์ด๋ฏธ์ง ์ธํธ๋ฅผ ๋ถ๋ฆฌํ์ฌ ๋ฐ๋ก ๊ฒ์ ์์คํ
์ ๊ตฌ์ถํ๋ค.
๋จผ์ ์ด๋ฏธ์ง์ ๋ถ์ฌ๋ ์๋ฏธ๋ก ์ ๋ ์ด๋ธ์ ์ฌ์ฉํ๋ ์ง๋ ํ์ต์ ๋์
ํ์ฌ ํด์ฑ ๊ธฐ๋ฐ ๊ฒ์ ์์คํ
์ ๊ตฌ์ถํ๋ค. ํด๋์ค ๊ฐ ์ ์ฌ์ฑ (๋ค๋ฅธ ์ฌ๋ ์ฌ์ด์ ์ ์ฌํ ์ธ๋ชจ) ๊ณผ ํด๋์ค ๋ด ๋ณํ(๊ฐ์ ์ฌ๋์ ๋ค๋ฅธ ํฌ์ฆ, ํ์ , ์กฐ๋ช
) ์ ๊ฐ์ ์ผ๊ตด ์ด๋ฏธ์ง ๊ตฌ๋ณ์ ์ด๋ ค์์ ํด๊ฒฐํ๊ธฐ ์ํด ๊ฐ ์ด๋ฏธ์ง์ ํด๋์ค ๋ ์ด๋ธ์ ์ฌ์ฉํ๋ค. ์ผ๊ตด ์ด๋ฏธ์ง ๊ฒ์ ํ์ง์ ๋์ฑ ํฅ์์ํค๊ธฐ ์ํด SGH (Similarity Guided Hashing) ๋ฐฉ์์ ์ ์ํ๋ฉฐ, ์ฌ๊ธฐ์ ๋ค์ค ๋ฐ์ดํฐ ์ฆ๊ฐ ๊ฒฐ๊ณผ๋ฅผ ์ฌ์ฉํ ์๊ธฐ ์ ์ฌ์ฑ ํ์ต์ด ํ๋ จ ์ค์ ์ฌ์ฉ๋๋ค. ๊ทธ๋ฆฌ๊ณ ํด์ฑ ๊ธฐ๋ฐ์ ์ผ๋ฐ ์ด๋ฏธ์ง ๊ฒ์ ์์คํ
์ ๊ตฌ์ฑํ๊ธฐ ์ํด DHD(Deep Hash Distillation) ๋ฐฉ์์ ์ ์ํ๋ค. DHD์์๋ ์ง๋ ์ ํธ๋ฅผ ํ์ฉํ๊ธฐ ์ํด ํด๋์ค๋ณ ๋ํ์ฑ์ ๋ํ๋ด๋ ํ๋ จ ๊ฐ๋ฅํ ํด์ ํ๋ก์ (proxy) ๋ฅผ ๋์
ํ๋ค. ๋ํ, ํด์ฑ์ ์ ํฉํ ์์ฒด ์ฆ๋ฅ ๊ธฐ๋ฒ์ ์ ์ํ์ฌ ์ฆ๊ฐ ๋ฐ์ดํฐ์ ์ ์ฌ๋ ฅ์ ์ผ๋ฐ์ ์ธ ์ด๋ฏธ์ง ๊ฒ์ ์ฑ๋ฅ ํฅ์์ ์ ์ฉํ๋ค.
๋์งธ๋ก, ๋ ์ด๋ธ์ด ์ง์ ๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ์ ๋ ์ด๋ธ์ด ์ง์ ๋์ง ์์ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ๋ชจ๋ ํ์ฉํ๋ ์ค์ง๋ ํ์ต์ ์กฐ์ฌํ์ฌ ๊ณฑ ์์ํ ๊ธฐ๋ฐ ๊ฒ์ ์์คํ
์ ๊ตฌ์ถํ๋ค. ์ง๋ ํ์ต ๋ฅ ๋ฌ๋ ๊ธฐ๋ฐ์ ์ด๋ฏธ์ง ๊ฒ์ ๋ฐฉ๋ฒ๋ค์ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์ด๋ ค๋ฉด ๊ฐ๋น์ผ ๋ ์ด๋ธ ์ ๋ณด๊ฐ ์ถฉ๋ถํด์ผ ํ๋ค๋ ๋จ์ ์ด ์๋ค. ๊ฒ๋ค๊ฐ, ๋ ์ด๋ธ์ด ์ง์ ๋์ง ์์ ์๋ง์ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ ํ๋ จ์์ ์ ์ธ๋๋ค๋ ํ๊ณ๊ฐ ์๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ฒกํฐ ์์ํ ๊ธฐ๋ฐ ๋ฐ์ง๋ ์์ ๊ฒ์ ๋ฐฉ์์ธ GPQ (Generalized Product Quantization) ๋คํธ์ํฌ๋ฅผ ์ ์ํ๋ค. ๋ ์ด๋ธ์ด ์ง์ ๋ ๋ฐ์ดํฐ ๊ฐ์ ์๋ฏธ๋ก ์ ์ ์ฌ์ฑ์ ์ ์งํ๋ ์๋ก์ด ๋ฉํธ๋ฆญ ํ์ต (Metric learning) ์ ๋ต๊ณผ ๋ ์ด๋ธ์ด ์ง์ ๋์ง ์์ ๋ฐ์ดํฐ์ ๊ณ ์ ํ ์ ์ฌ๋ ฅ์ ์ต๋ํ ํ์ฉํ๋ ์ํธ๋กํผ ์ ๊ทํ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ์ฌ ๊ฒ์ ์์คํ
์ ๊ฐ์ ํ๋ค. ์ด ์๋ฃจ์
์ ์์ํ ๋คํธ์ํฌ์ ์ผ๋ฐํ ์ฉ๋์ ์ฆ๊ฐ์์ผ ์ด์ ์ ํ๊ณ๋ฅผ ๊ทน๋ณตํ ์ ์๊ฒํ๋ค.
๋ง์ง๋ง์ผ๋ก, ๋ฅ ๋ฌ๋ ๋ชจ๋ธ์ด ์ฌ๋์ ์ง๋ ์์ด ์๊ฐ์ ์ผ๋ก ์ ์ฌํ ์ด๋ฏธ์ง ๊ฒ์์ ์ํํ ์ ์๋๋ก ํ๊ธฐ ์ํด ๋น์ง๋ ํ์ต ์๊ณ ๋ฆฌ์ฆ์ ํ์ํ๋ค. ๋น๋ก ๋ ์ด๋ธ ์ฃผ์์ ํ์ฉํ ์ฌ์ธต ์ง๋ ๊ธฐ๋ฐ์ ๋ฐฉ๋ฒ๋ค์ด ๊ธฐ์กด ๋ฐฉ๋ฒ๋ค์ ๋๋น ์ฐ์ํ ๊ฒ์ ์ฑ๋ฅ์ ๋ณด์ผ์ง๋ผ๋, ๋ฐฉ๋ํ ์์ ํ๋ จ ๋ฐ์ดํฐ์ ๋ํด ์ ํํ๊ฒ ๋ ์ด๋ธ์ ์ง์ ํ๋ ๊ฒ์ ํ๋ค๊ณ ์ฃผ์์์ ์ค๋ฅ๊ฐ ๋ฐ์ํ๊ธฐ ์ฝ๋ค๋ ํ๊ณ๊ฐ ์๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ ์ด๋ธ ์์ด ์์ฒด ์ง๋ ๋ฐฉ์์ผ๋ก ํ๋ จํ๋ SPQ (Self-supervised Product Quantization) ๋คํธ์ํฌ ๋ผ๋ ์ฌ์ธต ๋น์ง๋ ์ด๋ฏธ์ง ๊ฒ์ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์๋กญ๊ฒ ์ค๊ณ๋ ๊ต์ฐจ ์์ํ ๋์กฐ ํ์ต ๋ฐฉ์์ผ๋ก ์๋ก ๋ค๋ฅด๊ฒ ๋ณํ๋ ์ด๋ฏธ์ง๋ฅผ ๋น๊ตํ์ฌ ๊ณฑ ์์ํ์ ์ฝ๋์๋์ ์ฌ์ธต ์๊ฐ์ ํํ์ ๋์์ ํ์ตํ๋ค. ์ด ๋ฐฉ์์ ํตํด ์ด๋ฏธ์ง์ ๋ด์ ๋ ๋ด์ฉ์ ๋ณ๋์ ์ฌ๋ ์ง๋ ์์ด ๋คํธ์ํฌ๊ฐ ์ค์ค๋ก ์ดํดํ๊ฒ ๋๊ณ , ์๊ฐ์ ์ผ๋ก ์ ํํ ๊ฒ์์ ์ํํ ์ ์๋ ์ค๋ช
๊ธฐ๋ฅ์ ์ถ์ถํ ์ ์๊ฒ ๋๋ค.
๋ฒค์น๋งํฌ ๋ฐ์ดํฐ ์ธํธ์ ๋ํ ๊ด๋ฒ์ํ ์ด๋ฏธ์ง ๊ฒ์ ์คํ์ ์ํํ์ฌ ์ ์๋ ๋ฐฉ๋ฒ์ด ๋ค์ํ ํ๊ฐ ํ๋กํ ์ฝ์์ ๋ฐ์ด๋ ๊ฒฐ๊ณผ๋ฅผ ์ฐ์ถํจ์ ํ์ธํ๋ค. ์ง๋ ํ์ต ๊ธฐ๋ฐ์ ์ผ๊ตด ์์ ๊ฒ์์ ๊ฒฝ์ฐ SGH๋ ์ ํด์๋ ๋ฐ ๊ณ ํด์๋ ์ผ๊ตด ์์ ๋ชจ๋์์ ์ต๊ณ ์ ๊ฒ์ ์ฑ๋ฅ์ ๋ฌ์ฑํ์๊ณ , DHD๋ ์ต๊ณ ์ ๊ฒ์ ์ ํ๋๋ก ์ผ๋ฐ ์์ ๊ฒ์ ์คํ์์ ํจ์จ์ฑ์ ์
์ฆํ๋ค. ์ค์ง๋ ์ผ๋ฐ ์ด๋ฏธ์ง ๊ฒ์์ ๊ฒฝ์ฐ GPQ๋ ๋ ์ด๋ธ์ด ์๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ์ ๋ ์ด๋ธ์ด ์๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ๋ชจ๋ ์ฌ์ฉํ๋ ํ๋กํ ์ฝ์ ๋ํ ์ต์์ ๊ฒ์ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ค๋ค. ๋ง์ง๋ง์ผ๋ก, ๋น์ง๋ ํ์ต ์ด๋ฏธ์ง ๊ฒ์์ ๊ฒฝ์ฐ ์ง๋ ๋ฐฉ์์ผ๋ก ๋ฏธ๋ฆฌ ํ์ต๋ ์ด๊ธฐ ๊ฐ ์์ด๋ SPQ๋ฅผ ์ฌ์ฉํ์ฌ ์ต์์ ๊ฒ์ ์ ์๋ฅผ ์ป์์ผ๋ฉฐ ์๊ฐ์ ์ผ๋ก ์ ์ฌํ ์ด๋ฏธ์ง๊ฐ ๊ฒ์ ๊ฒฐ๊ณผ๋ก ์ฑ๊ณต์ ์ผ๋ก ๊ฒ์๋๋ ๊ฒ์ ๊ด์ฐฐํ ์ ์๋ค.Content-based image retrieval, which finds relevant images to a query from a huge database, is one of the fundamental tasks in the field of computer vision. Especially for conducting fast and accurate retrieval, Approximate Nearest Neighbor (ANN) search approaches represented by Hashing and Product Quantization (PQ) have been proposed to image retrieval community. Ever since neural network based deep learning has shown excellent performance in many computer vision tasks, both Hashing and product quantization-based image retrieval systems are also adopting deep learning for improvement. In this dissertation, image retrieval methods under various deep learning conditions are investigated to suggest the appropriate retrieval systems. Specifically, by considering the purpose of image retrieval, the supervised learning methods are proposed to develop the deep Hashing systems that retrieve semantically similar images, and the semi-supervised, unsupervised learning methods are proposed to establish the deep product quantization systems that retrieve both semantically and visually similar images. Moreover, by considering the characteristics of image retrieval database, the face image sets with numerous class categories, and the general image sets of one or more labeled images are separated to be explored when building a retrieval system.
First, supervised learning with the semantic labels given to images is introduced to build a Hashing-based retrieval system. To address the difficulties of distinguishing face images, such as the inter-class similarities (similar appearance between different persons) and the intra-class variations (same person with different pose, facial expressions, illuminations), the identity label of each image is employed to derive the discriminative binary codes. To further develop the face image retrieval quality, Similarity Guided Hashing (SGH) scheme is proposed, where the self-similarity learning with multiple data augmentation results are employed during training. In terms of Hashing-based general image retrieval systems, Deep Hash Distillation (DHD) scheme is proposed, where the trainable hash proxy that presents class-wise representative is introduced to take advantage of supervised signals. Moreover, self-distillation scheme adapted for Hashing is utilized to improve general image retrieval performance by exploiting the potential of augmented data appropriately.
Second, semi-supervised learning that utilizes both labeled and unlabeled image data is investigated to build a PQ-based retrieval system. Even if the supervised deep methods show excellent performance, they do not meet the expectations unless expensive label information is sufficient. Besides, there is a limitation that a tons of unlabeled image data is excluded from training. To resolve this issue, the vector quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network is proposed. A novel metric learning strategy that preserves semantic similarity between labeled data, and a entropy regularization term that fully exploits inherent potentials of unlabeled data are employed to improve the retrieval system. This solution increases the generalization capacity of the quantization network, which allows to overcome previous limitations.
Lastly, to enable the network to perform a visually similar image retrieval on its own without any human supervision, unsupervised learning algorithm is explored. Although, deep supervised Hashing and PQ methods achieve the outstanding retrieval performances compared to the conventional methods by fully exploiting the label annotations, however, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, the deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner is proposed. A newly designed Cross Quantized Contrastive learning strategy is applied to jointly learn the PQ codewords and the deep visual representations by comparing individually transformed images (views). This allows to understand the image content and extract descriptive features so that the visually accurate retrieval can be performed.
By conducting extensive image retrieval experiments on the benchmark datasets, the proposed methods are confirmed to yield the outstanding results under various evaluation protocols. For supervised face image retrieval, SGH achieves the best retrieval performance for both low and high resolution face image, and DHD also demonstrates its efficiency in general image retrieval experiments with the state-of-the-art retrieval performance. For semi-supervised general image retrieval, GPQ shows the best search results for protocols that use both labeled and unlabeled image data. Finally, for unsupervised general image retrieval, the best retrieval scores are achieved with SPQ even without supervised pre-training, and it can be observed that visually similar images are successfully retrieved as search results.Abstract i
Contents iv
List of Tables vii
List of Figures viii
1 Introduction 1
1.1 Contribution 3
1.2 Contents 4
2 Supervised Learning for Deep Hashing: Similarity Guided Hashing for Face Image Retrieval / Deep Hash Distillation for General Image Retrieval 5
2.1 Motivation and Overview for Face Image Retrieval 5
2.1.1 Related Works 9
2.2 Similarity Guided Hashing 10
2.3 Experiments 16
2.3.1 Datasets and Setup 16
2.3.2 Results on Small Face Images 18
2.3.3 Results on Large Face Images 19
2.4 Motivation and Overview for General Image Retrieval 20
2.5 Related Works 22
2.6 Deep Hash Distillation 24
2.6.1 Self-distilled Hashing 24
2.6.2 Teacher loss 27
2.6.3 Training 29
2.6.4 Hamming Distance Analysis 29
2.7 Experiments 32
2.7.1 Setup 32
2.7.2 Implementation Details 32
2.7.3 Results 34
2.7.4 Analysis 37
3 Semi-supervised Learning for Product Quantization: Generalized Product Quantization Network for Semi-supervised Image Retrieval 42
3.1 Motivation and Overview 42
3.1.1 Related Work 45
3.2 Generalized Product Quantization 47
3.2.1 Semi-Supervised Learning 48
3.2.2 Retrieval 52
3.3 Experiments 53
3.3.1 Setup 53
3.3.2 Results and Analysis 55
4 Unsupervised Learning for Product Quantization: Self-supervised Product Quantization for Deep Unsupervised Image Retrieval 58
4.1 Motivation and Overview 58
4.1.1 Related Works 61
4.2 Self-supervised Product Quantization 62
4.2.1 Overall Framework 62
4.2.2 Self-supervised Training 64
4.3 Experiments 67
4.3.1 Datasets 67
4.3.2 Experimental Settings 68
4.3.3 Results 71
4.3.4 Empirical Analysis 71
5 Conclusion 75
Abstract (In Korean) 88๋ฐ
Information Retrieval: Recent Advances and Beyond
In this paper, we provide a detailed overview of the models used for
information retrieval in the first and second stages of the typical processing
chain. We discuss the current state-of-the-art models, including methods based
on terms, semantic retrieval, and neural. Additionally, we delve into the key
topics related to the learning process of these models. This way, this survey
offers a comprehensive understanding of the field and is of interest for for
researchers and practitioners entering/working in the information retrieval
domain
Torsion Graph Neural Networks
Geometric deep learning (GDL) models have demonstrated a great potential for
the analysis of non-Euclidian data. They are developed to incorporate the
geometric and topological information of non-Euclidian data into the end-to-end
deep learning architectures. Motivated by the recent success of discrete Ricci
curvature in graph neural network (GNNs), we propose TorGNN, an analytic
Torsion enhanced Graph Neural Network model. The essential idea is to
characterize graph local structures with an analytic torsion based weight
formula. Mathematically, analytic torsion is a topological invariant that can
distinguish spaces which are homotopy equivalent but not homeomorphic. In our
TorGNN, for each edge, a corresponding local simplicial complex is identified,
then the analytic torsion (for this local simplicial complex) is calculated,
and further used as a weight (for this edge) in message-passing process. Our
TorGNN model is validated on link prediction tasks from sixteen different types
of networks and node classification tasks from three types of networks. It has
been found that our TorGNN can achieve superior performance on both tasks, and
outperform various state-of-the-art models. This demonstrates that analytic
torsion is a highly efficient topological invariant in the characterization of
graph structures and can significantly boost the performance of GNNs
Deep Image Retrieval: A Survey
In recent years a vast amount of visual content has been generated and shared
from various fields, such as social media platforms, medical images, and
robotics. This abundance of content creation and sharing has introduced new
challenges. In particular, searching databases for similar content, i.e.content
based image retrieval (CBIR), is a long-established research area, and more
efficient and accurate methods are needed for real time retrieval. Artificial
intelligence has made progress in CBIR and has significantly facilitated the
process of intelligent search. In this survey we organize and review recent
CBIR works that are developed based on deep learning algorithms and techniques,
including insights and techniques from recent papers. We identify and present
the commonly-used benchmarks and evaluation methods used in the field. We
collect common challenges and propose promising future directions. More
specifically, we focus on image retrieval with deep learning and organize the
state of the art methods according to the types of deep network structure, deep
features, feature enhancement methods, and network fine-tuning strategies. Our
survey considers a wide variety of recent methods, aiming to promote a global
view of the field of instance-based CBIR.Comment: 20 pages, 11 figure
Exploring deep learning powered person re-identification
With increased security demands, more and more video surveillance systems are installed in public places, such as schools, stations, and shopping malls. Such large-scale monitoring requires 24/7 video analytics, which cannot be achieved purely by manual operations. Thanks to recent advances in artificial intelligence (AI), deep learning algorithms enable automatic video analytics via smart devices, which interpret people/vehicle behaviours in real time to avoid anomalies effectively. Among various video analytical tasks, people search is one of the most critical use cases due to its wide application scenarios, such as searching for missing people, detecting intruders, and tracking suspects. However, current AI-powered people search is generally built upon facial recognition technique, which is effective yet may be privacy-invaded. To address the problem, person re-identification (ReID), which aims to identify person-of-interest without facial information, has become an effective panacea. Despite considerable achievements in recent years, person ReID still faces some tough challenges, such as 1) the strong reliance on identity labels during feature learning, 2) the tradeoff between searching speed and identification accuracy, and 3) the huge modality discrepancy lying between data from different sources, e.g., RGB image and infrared (IR) image. Therefore, the research interest of this thesis is to focus on the above challenges in person ReID, analyze the advantages and limitations of existing solutions, and propose improved solutions for each challenge. Specifically, to alleviate the identity label reliance during feature learning, an improved unsupervised person ReID framework is proposed in Chapter 3, which refines not only imperfect cluster results but also the optimisation directions of samples. Based on the unsupervised setting, we further focus on the tradeoff between searching speed and identification accuracy. To this end, an improved unsupervised binary feature learning scheme for person ReID is proposed in Chapter 4, which derives binary identity representations that not only are robust to transformations but also have low bit correlations. Apart from person ReID conducted within a single modality where both query and gallery are RGB images, cross-modality retrieval is more challenging yet more common in real-world scenarios. To handle the problem, a two-stream framework, facilitating person ReID with on-the-fly keypoint-aware features, is proposed in Chapter 5. Furthermore, the thesis spots several promising research topics in Chapter 6, which are instructive for future works in person ReI
- โฆ