1,442 research outputs found
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Towards efficient deep neural networks with applications to visual recognition
The thesis focuses on the following two topics: designing energy-efficient neural
networks and hashing approach to make deep learning more feasible to real applications;
deep convolutional neural networks for visual recognition.Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Computer Science, 201
๋ค์ํ ๋ฅ ๋ฌ๋ ํ์ต ํ๊ฒฝ ํ์ ์ปจํ ์ธ ๊ธฐ๋ฐ ์ด๋ฏธ์ง ๊ฒ์
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2022.2. ์กฐ๋จ์ต.๋ฐฉ๋ํ ๋ฐ์ดํฐ๋ฒ ์ด์ค์์ ์ง์์ ๋ํ ๊ด๋ จ ์ด๋ฏธ์ง๋ฅผ ์ฐพ๋ ์ฝํ
์ธ ๊ธฐ๋ฐ ์ด๋ฏธ์ง ๊ฒ์์ ์ปดํจํฐ ๋น์ ๋ถ์ผ์ ๊ทผ๋ณธ์ ์ธ ์์
์ค ํ๋์ด๋ค. ํนํ ๋น ๋ฅด๊ณ ์ ํํ ๊ฒ์์ ์ํํ๊ธฐ ์ํด ํด์ฑ (Hashing) ๋ฐ ๊ณฑ ์์ํ (Product Quantization, PQ) ๋ก ๋ํ๋๋ ๊ทผ์ฌ์ต๊ทผ์ ์ด์ (Approximate Nearest Neighbor, ANN) ๊ฒ์ ๋ฐฉ์์ด ์ด๋ฏธ์ง ๊ฒ์ ์ปค๋ฎค๋ํฐ์์ ์ฃผ๋ชฉ๋ฐ๊ณ ์๋ค. ์ ๊ฒฝ๋ง ๊ธฐ๋ฐ ๋ฅ ๋ฌ๋ (CNN-based deep learning) ์ด ๋ง์ ์ปดํจํฐ ๋น์ ์์
์์ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์ฌ์ค ์ดํ๋ก, ํด์ฑ ๋ฐ ๊ณฑ ์์ํ ๊ธฐ๋ฐ ์ด๋ฏธ์ง ๊ฒ์ ์์คํ
๋ชจ๋ ๊ฐ์ ์ ์ํด ๋ฅ ๋ฌ๋์ ์ฑํํ๊ณ ์๋ค. ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ์ ์ ํ ๊ฒ์ ์์คํ
์ ์ ์ํ๊ธฐ ์ํด ๋ค์ํ ๋ฅ ๋ฌ๋ ํ์ต ํ๊ฒฝ์๋์์ ์ด๋ฏธ์ง ๊ฒ์ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ์ด๋ฏธ์ง ๊ฒ์์ ๋ชฉ์ ์ ๊ณ ๋ คํ์ฌ ์๋ฏธ์ ์ผ๋ก ์ ์ฌํ ์ด๋ฏธ์ง๋ฅผ ๊ฒ์ํ๋ ๋ฅ ๋ฌ๋ ํด์ฑ ์์คํ
์ ๊ฐ๋ฐํ๊ธฐ ์ํ ์ง๋ ํ์ต ๋ฐฉ๋ฒ์ ์ ์ํ๊ณ , ์๋ฏธ์ , ์๊ฐ์ ์ผ๋ก ๋ชจ๋ ์ ์ฌํ ์ด๋ฏธ์ง๋ฅผ ๊ฒ์ํ๋ ๋ฅ ๋ฌ๋ ๊ณฑ ์์ํ ๊ธฐ๋ฐ์ ์์คํ
์ ๊ตฌ์ถํ๊ธฐ ์ํ ์ค์ง๋, ๋น์ง๋ ํ์ต ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋ํ, ์ด๋ฏธ์ง ๊ฒ์ ๋ฐ์ดํฐ๋ฒ ์ด์ค์ ํน์ฑ์ ๊ณ ๋ คํ์ฌ, ๋ถ๋ฅํด์ผํ ํด๋์ค (class category) ๊ฐ ๋ง์ ์ผ๊ตด ์ด๋ฏธ์ง ๋ฐ์ดํฐ ์ธํธ์ ํ๋ ์ด์์ ๋ ์ด๋ธ (label) ์ด ์ง์ ๋ ์ผ๋ฐ ์ด๋ฏธ์ง ์ธํธ๋ฅผ ๋ถ๋ฆฌํ์ฌ ๋ฐ๋ก ๊ฒ์ ์์คํ
์ ๊ตฌ์ถํ๋ค.
๋จผ์ ์ด๋ฏธ์ง์ ๋ถ์ฌ๋ ์๋ฏธ๋ก ์ ๋ ์ด๋ธ์ ์ฌ์ฉํ๋ ์ง๋ ํ์ต์ ๋์
ํ์ฌ ํด์ฑ ๊ธฐ๋ฐ ๊ฒ์ ์์คํ
์ ๊ตฌ์ถํ๋ค. ํด๋์ค ๊ฐ ์ ์ฌ์ฑ (๋ค๋ฅธ ์ฌ๋ ์ฌ์ด์ ์ ์ฌํ ์ธ๋ชจ) ๊ณผ ํด๋์ค ๋ด ๋ณํ(๊ฐ์ ์ฌ๋์ ๋ค๋ฅธ ํฌ์ฆ, ํ์ , ์กฐ๋ช
) ์ ๊ฐ์ ์ผ๊ตด ์ด๋ฏธ์ง ๊ตฌ๋ณ์ ์ด๋ ค์์ ํด๊ฒฐํ๊ธฐ ์ํด ๊ฐ ์ด๋ฏธ์ง์ ํด๋์ค ๋ ์ด๋ธ์ ์ฌ์ฉํ๋ค. ์ผ๊ตด ์ด๋ฏธ์ง ๊ฒ์ ํ์ง์ ๋์ฑ ํฅ์์ํค๊ธฐ ์ํด SGH (Similarity Guided Hashing) ๋ฐฉ์์ ์ ์ํ๋ฉฐ, ์ฌ๊ธฐ์ ๋ค์ค ๋ฐ์ดํฐ ์ฆ๊ฐ ๊ฒฐ๊ณผ๋ฅผ ์ฌ์ฉํ ์๊ธฐ ์ ์ฌ์ฑ ํ์ต์ด ํ๋ จ ์ค์ ์ฌ์ฉ๋๋ค. ๊ทธ๋ฆฌ๊ณ ํด์ฑ ๊ธฐ๋ฐ์ ์ผ๋ฐ ์ด๋ฏธ์ง ๊ฒ์ ์์คํ
์ ๊ตฌ์ฑํ๊ธฐ ์ํด DHD(Deep Hash Distillation) ๋ฐฉ์์ ์ ์ํ๋ค. DHD์์๋ ์ง๋ ์ ํธ๋ฅผ ํ์ฉํ๊ธฐ ์ํด ํด๋์ค๋ณ ๋ํ์ฑ์ ๋ํ๋ด๋ ํ๋ จ ๊ฐ๋ฅํ ํด์ ํ๋ก์ (proxy) ๋ฅผ ๋์
ํ๋ค. ๋ํ, ํด์ฑ์ ์ ํฉํ ์์ฒด ์ฆ๋ฅ ๊ธฐ๋ฒ์ ์ ์ํ์ฌ ์ฆ๊ฐ ๋ฐ์ดํฐ์ ์ ์ฌ๋ ฅ์ ์ผ๋ฐ์ ์ธ ์ด๋ฏธ์ง ๊ฒ์ ์ฑ๋ฅ ํฅ์์ ์ ์ฉํ๋ค.
๋์งธ๋ก, ๋ ์ด๋ธ์ด ์ง์ ๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ์ ๋ ์ด๋ธ์ด ์ง์ ๋์ง ์์ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ๋ชจ๋ ํ์ฉํ๋ ์ค์ง๋ ํ์ต์ ์กฐ์ฌํ์ฌ ๊ณฑ ์์ํ ๊ธฐ๋ฐ ๊ฒ์ ์์คํ
์ ๊ตฌ์ถํ๋ค. ์ง๋ ํ์ต ๋ฅ ๋ฌ๋ ๊ธฐ๋ฐ์ ์ด๋ฏธ์ง ๊ฒ์ ๋ฐฉ๋ฒ๋ค์ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์ด๋ ค๋ฉด ๊ฐ๋น์ผ ๋ ์ด๋ธ ์ ๋ณด๊ฐ ์ถฉ๋ถํด์ผ ํ๋ค๋ ๋จ์ ์ด ์๋ค. ๊ฒ๋ค๊ฐ, ๋ ์ด๋ธ์ด ์ง์ ๋์ง ์์ ์๋ง์ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ ํ๋ จ์์ ์ ์ธ๋๋ค๋ ํ๊ณ๊ฐ ์๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ฒกํฐ ์์ํ ๊ธฐ๋ฐ ๋ฐ์ง๋ ์์ ๊ฒ์ ๋ฐฉ์์ธ GPQ (Generalized Product Quantization) ๋คํธ์ํฌ๋ฅผ ์ ์ํ๋ค. ๋ ์ด๋ธ์ด ์ง์ ๋ ๋ฐ์ดํฐ ๊ฐ์ ์๋ฏธ๋ก ์ ์ ์ฌ์ฑ์ ์ ์งํ๋ ์๋ก์ด ๋ฉํธ๋ฆญ ํ์ต (Metric learning) ์ ๋ต๊ณผ ๋ ์ด๋ธ์ด ์ง์ ๋์ง ์์ ๋ฐ์ดํฐ์ ๊ณ ์ ํ ์ ์ฌ๋ ฅ์ ์ต๋ํ ํ์ฉํ๋ ์ํธ๋กํผ ์ ๊ทํ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ์ฌ ๊ฒ์ ์์คํ
์ ๊ฐ์ ํ๋ค. ์ด ์๋ฃจ์
์ ์์ํ ๋คํธ์ํฌ์ ์ผ๋ฐํ ์ฉ๋์ ์ฆ๊ฐ์์ผ ์ด์ ์ ํ๊ณ๋ฅผ ๊ทน๋ณตํ ์ ์๊ฒํ๋ค.
๋ง์ง๋ง์ผ๋ก, ๋ฅ ๋ฌ๋ ๋ชจ๋ธ์ด ์ฌ๋์ ์ง๋ ์์ด ์๊ฐ์ ์ผ๋ก ์ ์ฌํ ์ด๋ฏธ์ง ๊ฒ์์ ์ํํ ์ ์๋๋ก ํ๊ธฐ ์ํด ๋น์ง๋ ํ์ต ์๊ณ ๋ฆฌ์ฆ์ ํ์ํ๋ค. ๋น๋ก ๋ ์ด๋ธ ์ฃผ์์ ํ์ฉํ ์ฌ์ธต ์ง๋ ๊ธฐ๋ฐ์ ๋ฐฉ๋ฒ๋ค์ด ๊ธฐ์กด ๋ฐฉ๋ฒ๋ค์ ๋๋น ์ฐ์ํ ๊ฒ์ ์ฑ๋ฅ์ ๋ณด์ผ์ง๋ผ๋, ๋ฐฉ๋ํ ์์ ํ๋ จ ๋ฐ์ดํฐ์ ๋ํด ์ ํํ๊ฒ ๋ ์ด๋ธ์ ์ง์ ํ๋ ๊ฒ์ ํ๋ค๊ณ ์ฃผ์์์ ์ค๋ฅ๊ฐ ๋ฐ์ํ๊ธฐ ์ฝ๋ค๋ ํ๊ณ๊ฐ ์๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ ์ด๋ธ ์์ด ์์ฒด ์ง๋ ๋ฐฉ์์ผ๋ก ํ๋ จํ๋ SPQ (Self-supervised Product Quantization) ๋คํธ์ํฌ ๋ผ๋ ์ฌ์ธต ๋น์ง๋ ์ด๋ฏธ์ง ๊ฒ์ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์๋กญ๊ฒ ์ค๊ณ๋ ๊ต์ฐจ ์์ํ ๋์กฐ ํ์ต ๋ฐฉ์์ผ๋ก ์๋ก ๋ค๋ฅด๊ฒ ๋ณํ๋ ์ด๋ฏธ์ง๋ฅผ ๋น๊ตํ์ฌ ๊ณฑ ์์ํ์ ์ฝ๋์๋์ ์ฌ์ธต ์๊ฐ์ ํํ์ ๋์์ ํ์ตํ๋ค. ์ด ๋ฐฉ์์ ํตํด ์ด๋ฏธ์ง์ ๋ด์ ๋ ๋ด์ฉ์ ๋ณ๋์ ์ฌ๋ ์ง๋ ์์ด ๋คํธ์ํฌ๊ฐ ์ค์ค๋ก ์ดํดํ๊ฒ ๋๊ณ , ์๊ฐ์ ์ผ๋ก ์ ํํ ๊ฒ์์ ์ํํ ์ ์๋ ์ค๋ช
๊ธฐ๋ฅ์ ์ถ์ถํ ์ ์๊ฒ ๋๋ค.
๋ฒค์น๋งํฌ ๋ฐ์ดํฐ ์ธํธ์ ๋ํ ๊ด๋ฒ์ํ ์ด๋ฏธ์ง ๊ฒ์ ์คํ์ ์ํํ์ฌ ์ ์๋ ๋ฐฉ๋ฒ์ด ๋ค์ํ ํ๊ฐ ํ๋กํ ์ฝ์์ ๋ฐ์ด๋ ๊ฒฐ๊ณผ๋ฅผ ์ฐ์ถํจ์ ํ์ธํ๋ค. ์ง๋ ํ์ต ๊ธฐ๋ฐ์ ์ผ๊ตด ์์ ๊ฒ์์ ๊ฒฝ์ฐ SGH๋ ์ ํด์๋ ๋ฐ ๊ณ ํด์๋ ์ผ๊ตด ์์ ๋ชจ๋์์ ์ต๊ณ ์ ๊ฒ์ ์ฑ๋ฅ์ ๋ฌ์ฑํ์๊ณ , DHD๋ ์ต๊ณ ์ ๊ฒ์ ์ ํ๋๋ก ์ผ๋ฐ ์์ ๊ฒ์ ์คํ์์ ํจ์จ์ฑ์ ์
์ฆํ๋ค. ์ค์ง๋ ์ผ๋ฐ ์ด๋ฏธ์ง ๊ฒ์์ ๊ฒฝ์ฐ GPQ๋ ๋ ์ด๋ธ์ด ์๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ์ ๋ ์ด๋ธ์ด ์๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ๋ชจ๋ ์ฌ์ฉํ๋ ํ๋กํ ์ฝ์ ๋ํ ์ต์์ ๊ฒ์ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ค๋ค. ๋ง์ง๋ง์ผ๋ก, ๋น์ง๋ ํ์ต ์ด๋ฏธ์ง ๊ฒ์์ ๊ฒฝ์ฐ ์ง๋ ๋ฐฉ์์ผ๋ก ๋ฏธ๋ฆฌ ํ์ต๋ ์ด๊ธฐ ๊ฐ ์์ด๋ SPQ๋ฅผ ์ฌ์ฉํ์ฌ ์ต์์ ๊ฒ์ ์ ์๋ฅผ ์ป์์ผ๋ฉฐ ์๊ฐ์ ์ผ๋ก ์ ์ฌํ ์ด๋ฏธ์ง๊ฐ ๊ฒ์ ๊ฒฐ๊ณผ๋ก ์ฑ๊ณต์ ์ผ๋ก ๊ฒ์๋๋ ๊ฒ์ ๊ด์ฐฐํ ์ ์๋ค.Content-based image retrieval, which finds relevant images to a query from a huge database, is one of the fundamental tasks in the field of computer vision. Especially for conducting fast and accurate retrieval, Approximate Nearest Neighbor (ANN) search approaches represented by Hashing and Product Quantization (PQ) have been proposed to image retrieval community. Ever since neural network based deep learning has shown excellent performance in many computer vision tasks, both Hashing and product quantization-based image retrieval systems are also adopting deep learning for improvement. In this dissertation, image retrieval methods under various deep learning conditions are investigated to suggest the appropriate retrieval systems. Specifically, by considering the purpose of image retrieval, the supervised learning methods are proposed to develop the deep Hashing systems that retrieve semantically similar images, and the semi-supervised, unsupervised learning methods are proposed to establish the deep product quantization systems that retrieve both semantically and visually similar images. Moreover, by considering the characteristics of image retrieval database, the face image sets with numerous class categories, and the general image sets of one or more labeled images are separated to be explored when building a retrieval system.
First, supervised learning with the semantic labels given to images is introduced to build a Hashing-based retrieval system. To address the difficulties of distinguishing face images, such as the inter-class similarities (similar appearance between different persons) and the intra-class variations (same person with different pose, facial expressions, illuminations), the identity label of each image is employed to derive the discriminative binary codes. To further develop the face image retrieval quality, Similarity Guided Hashing (SGH) scheme is proposed, where the self-similarity learning with multiple data augmentation results are employed during training. In terms of Hashing-based general image retrieval systems, Deep Hash Distillation (DHD) scheme is proposed, where the trainable hash proxy that presents class-wise representative is introduced to take advantage of supervised signals. Moreover, self-distillation scheme adapted for Hashing is utilized to improve general image retrieval performance by exploiting the potential of augmented data appropriately.
Second, semi-supervised learning that utilizes both labeled and unlabeled image data is investigated to build a PQ-based retrieval system. Even if the supervised deep methods show excellent performance, they do not meet the expectations unless expensive label information is sufficient. Besides, there is a limitation that a tons of unlabeled image data is excluded from training. To resolve this issue, the vector quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network is proposed. A novel metric learning strategy that preserves semantic similarity between labeled data, and a entropy regularization term that fully exploits inherent potentials of unlabeled data are employed to improve the retrieval system. This solution increases the generalization capacity of the quantization network, which allows to overcome previous limitations.
Lastly, to enable the network to perform a visually similar image retrieval on its own without any human supervision, unsupervised learning algorithm is explored. Although, deep supervised Hashing and PQ methods achieve the outstanding retrieval performances compared to the conventional methods by fully exploiting the label annotations, however, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, the deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner is proposed. A newly designed Cross Quantized Contrastive learning strategy is applied to jointly learn the PQ codewords and the deep visual representations by comparing individually transformed images (views). This allows to understand the image content and extract descriptive features so that the visually accurate retrieval can be performed.
By conducting extensive image retrieval experiments on the benchmark datasets, the proposed methods are confirmed to yield the outstanding results under various evaluation protocols. For supervised face image retrieval, SGH achieves the best retrieval performance for both low and high resolution face image, and DHD also demonstrates its efficiency in general image retrieval experiments with the state-of-the-art retrieval performance. For semi-supervised general image retrieval, GPQ shows the best search results for protocols that use both labeled and unlabeled image data. Finally, for unsupervised general image retrieval, the best retrieval scores are achieved with SPQ even without supervised pre-training, and it can be observed that visually similar images are successfully retrieved as search results.Abstract i
Contents iv
List of Tables vii
List of Figures viii
1 Introduction 1
1.1 Contribution 3
1.2 Contents 4
2 Supervised Learning for Deep Hashing: Similarity Guided Hashing for Face Image Retrieval / Deep Hash Distillation for General Image Retrieval 5
2.1 Motivation and Overview for Face Image Retrieval 5
2.1.1 Related Works 9
2.2 Similarity Guided Hashing 10
2.3 Experiments 16
2.3.1 Datasets and Setup 16
2.3.2 Results on Small Face Images 18
2.3.3 Results on Large Face Images 19
2.4 Motivation and Overview for General Image Retrieval 20
2.5 Related Works 22
2.6 Deep Hash Distillation 24
2.6.1 Self-distilled Hashing 24
2.6.2 Teacher loss 27
2.6.3 Training 29
2.6.4 Hamming Distance Analysis 29
2.7 Experiments 32
2.7.1 Setup 32
2.7.2 Implementation Details 32
2.7.3 Results 34
2.7.4 Analysis 37
3 Semi-supervised Learning for Product Quantization: Generalized Product Quantization Network for Semi-supervised Image Retrieval 42
3.1 Motivation and Overview 42
3.1.1 Related Work 45
3.2 Generalized Product Quantization 47
3.2.1 Semi-Supervised Learning 48
3.2.2 Retrieval 52
3.3 Experiments 53
3.3.1 Setup 53
3.3.2 Results and Analysis 55
4 Unsupervised Learning for Product Quantization: Self-supervised Product Quantization for Deep Unsupervised Image Retrieval 58
4.1 Motivation and Overview 58
4.1.1 Related Works 61
4.2 Self-supervised Product Quantization 62
4.2.1 Overall Framework 62
4.2.2 Self-supervised Training 64
4.3 Experiments 67
4.3.1 Datasets 67
4.3.2 Experimental Settings 68
4.3.3 Results 71
4.3.4 Empirical Analysis 71
5 Conclusion 75
Abstract (In Korean) 88๋ฐ
- โฆ