Search CORE

157 research outputs found

Vector Quantization Techniques for Approximate Nearest Neighbor Search on Large-Scale Datasets

Author: Ozan Ezgi Can
Publication venue: Tampere University of Technology
Publication date: 01/01/2017
Field of study

The technological developments of the last twenty years are leading the world to a new era. The invention of the internet, mobile phones and smart devices are resulting in an exponential increase in data. As the data is growing every day, finding similar patterns or matching samples to a query is no longer a simple task because of its computational costs and storage limitations. Special signal processing techniques are required in order to handle the growth in data, as simply adding more and more computers cannot keep up.Nearest neighbor search, or similarity search, proximity search or near item search is the problem of finding an item that is nearest or most similar to a query according to a distance or similarity measure. When the reference set is very large, or the distance or similarity calculation is complex, performing the nearest neighbor search can be computationally demanding. Considering today’s ever-growing datasets, where the cardinality of samples also keep increasing, a growing interest towards approximate methods has emerged in the research community.Vector Quantization for Approximate Nearest Neighbor Search (VQ for ANN) has proven to be one of the most efficient and successful methods targeting the aforementioned problem. It proposes to compress vectors into binary strings and approximate the distances between vectors using look-up tables. With this approach, the approximation of distances is very fast, while the storage space requirement of the dataset is minimized thanks to the extreme compression levels. The distance approximation performance of VQ for ANN has been shown to be sufficiently well for retrieval and classification tasks demonstrating that VQ for ANN techniques can be a good replacement for exact distance calculation methods.This thesis contributes to VQ for ANN literature by proposing five advanced techniques, which aim to provide fast and efficient approximate nearest neighbor search on very large-scale datasets. The proposed methods can be divided into two groups. The first group consists of two techniques, which propose to introduce subspace clustering to VQ for ANN. These methods are shown to give the state-of-the-art performance according to tests on prevalent large-scale benchmarks. The second group consists of three methods, which propose improvements on residual vector quantization. These methods are also shown to outperform their predecessors. Apart from these, a sixth contribution in this thesis is a demonstration of VQ for ANN in an application of image classification on large-scale datasets. It is shown that a k-NN classifier based on VQ for ANN performs on par with the k-NN classifiers, but requires much less storage space and computations

Trepo - Institutional Repository of Tampere University

Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation

Author: Dong Zhenhua
Duan Haoyi
Su Liangcai
Tang Ruiming
Xiao Xi
Yan Fan
Zhao Zhou
Zhu Jieming
Publication venue
Publication date: 29/11/2023
Field of study

Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications. The success of two-tower matching attributes to its efficiency in retrieval among a large number of items, since the item tower can be precomputed and used for fast Approximate Nearest Neighbor (ANN) search. However, it suffers two main challenges, including limited feature interaction capability and reduced accuracy in online serving. Existing approaches attempt to design novel late interactions instead of dot products, but they still fail to support complex feature interactions or lose retrieval efficiency. To address these challenges, we propose a new matching paradigm named SparCode, which supports not only sophisticated feature interactions but also efficient retrieval. Specifically, SparCode introduces an all-to-all interaction module to model fine-grained query-item interactions. Besides, we design a discrete code-based sparse inverted index jointly trained with the model to achieve effective and efficient model inference. Extensive experiments have been conducted on open benchmark datasets to demonstrate the superiority of our framework. The results show that SparCode significantly improves the accuracy of candidate item matching while retaining the same level of retrieval efficiency with two-tower models. Our source code will be available at MindSpore/models.Comment: Accepted by SIGIR 2023. Code will be available at https://reczoo.github.io/SparCod

arXiv.org e-Print Archive

다양한 딥 러닝 학습 환경 하의 컨텐츠 기반 이미지 검색

Author: 장영균
Publication venue: 서울대학교 대학원
Publication date: 01/02/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 조남익.방대한 데이터베이스에서 질의에 대한 관련 이미지를 찾는 콘텐츠 기반 이미지 검색은 컴퓨터 비전 분야의 근본적인 작업 중 하나이다. 특히 빠르고 정확한 검색을 수행하기 위해 해싱 (Hashing) 및 곱 양자화 (Product Quantization, PQ) 로 대표되는 근사최근접 이웃 (Approximate Nearest Neighbor, ANN) 검색 방식이 이미지 검색 커뮤니티에서 주목받고 있다. 신경망 기반 딥 러닝 (CNN-based deep learning) 이 많은 컴퓨터 비전 작업에서 우수한 성능을 보여준 이후로, 해싱 및 곱 양자화 기반 이미지 검색 시스템 모두 개선을 위해 딥 러닝을 채택하고 있다. 본 학위 논문에서는 적절한 검색 시스템을 제안하기 위해 다양한 딥 러닝 학습 환경아래에서 이미지 검색 방법을 제안한다. 구체적으로, 이미지 검색의 목적을 고려하여 의미적으로 유사한 이미지를 검색하는 딥 러닝 해싱 시스템을 개발하기 위한 지도 학습 방법을 제안하고, 의미적, 시각적으로 모두 유사한 이미지를 검색하는 딥 러닝 곱 양자화 기반의 시스템을 구축하기 위한 준지도, 비지도 학습 방법을 제안한다. 또한, 이미지 검색 데이터베이스의 특성을 고려하여, 분류해야할 클래스 (class category) 가 많은 얼굴 이미지 데이터 세트와 하나 이상의 레이블 (label) 이 지정된 일반 이미지 세트를 분리하여 따로 검색 시스템을 구축한다. 먼저 이미지에 부여된 의미론적 레이블을 사용하는 지도 학습을 도입하여 해싱 기반 검색 시스템을 구축한다. 클래스 간 유사성 (다른 사람 사이의 유사한 외모) 과 클래스 내 변화(같은 사람의 다른 포즈, 표정, 조명) 와 같은 얼굴 이미지 구별의 어려움을 해결하기 위해 각 이미지의 클래스 레이블을 사용한다. 얼굴 이미지 검색 품질을 더욱 향상시키기 위해 SGH (Similarity Guided Hashing) 방식을 제안하며, 여기서 다중 데이터 증강 결과를 사용한 자기 유사성 학습이 훈련 중에 사용된다. 그리고 해싱 기반의 일반 이미지 검색 시스템을 구성하기 위해 DHD(Deep Hash Distillation) 방식을 제안한다. DHD에서는 지도 신호를 활용하기 위해 클래스별 대표성을 나타내는 훈련 가능한 해시 프록시 (proxy) 를 도입한다. 또한, 해싱에 적합한 자체 증류 기법을 제안하여 증강 데이터의 잠재력을 일반적인 이미지 검색 성능 향상에 적용한다. 둘째로, 레이블이 지정된 이미지 데이터와 레이블이 지정되지 않은 이미지 데이터를 모두 활용하는 준지도 학습을 조사하여 곱 양자화 기반 검색 시스템을 구축한다. 지도 학습 딥 러닝 기반의 이미지 검색 방법들은 우수한 성능을 보이려면 값비싼 레이블 정보가 충분해야 한다는 단점이 있다. 게다가, 레이블이 지정되지 않은 수많은 이미지 데이터는 훈련에서 제외된다는 한계가 있다. 이 문제를 해결하기 위해 벡터 양자화 기반 반지도 영상 검색 방식인 GPQ (Generalized Product Quantization) 네트워크를 제안한다. 레이블이 지정된 데이터 간의 의미론적 유사성을 유지하는 새로운 메트릭 학습 (Metric learning) 전략과 레이블이 지정되지 않은 데이터의 고유한 잠재력을 최대한 활용하는 엔트로피 정규화 방법을 사용하여 검색 시스템을 개선한다. 이 솔루션은 양자화 네트워크의 일반화 용량을 증가시켜 이전의 한계를 극복할 수 있게한다. 마지막으로, 딥 러닝 모델이 사람의 지도 없이 시각적으로 유사한 이미지 검색을 수행할 수 있도록 하기 위해 비지도 학습 알고리즘을 탐색한다. 비록 레이블 주석을 활용한 심층 지도 기반의 방법들이 기존 방법들에 대비 우수한 검색 성능을 보일지라도, 방대한 양의 훈련 데이터에 대해 정확하게 레이블을 지정하는 것은 힘들고 주석에서 오류가 발생하기 쉽다는 한계가 있다. 이 문제를 해결하기 위해 레이블 없이 자체 지도 방식으로 훈련하는 SPQ (Self-supervised Product Quantization) 네트워크 라는 심층 비지도 이미지 검색 방법을 제안한다. 새롭게 설계된 교차 양자화 대조 학습 방식으로 서로 다르게 변환된 이미지를 비교하여 곱 양자화의 코드워드와 심층 시각적 표현을 동시에 학습한다. 이 방식을 통해 이미지에 내제된 내용을 별도의 사람 지도 없이 네트워크가 스스로 이해하게 되고, 시각적으로 정확한 검색을 수행할 수 있는 설명 기능을 추출할 수 있게 된다. 벤치마크 데이터 세트에 대한 광범위한 이미지 검색 실험을 수행하여 제안된 방법이 다양한 평가 프로토콜에서 뛰어난 결과를 산출함을 확인했다. 지도 학습 기반의 얼굴 영상 검색의 경우 SGH는 저해상도 및 고해상도 얼굴 영상 모두에서 최고의 검색 성능을 달성하였고, DHD는 최고의 검색 정확도로 일반 영상 검색 실험에서 효율성을 입증한다. 준지도 일반 이미지 검색의 경우 GPQ는 레이블이 있는 이미지 데이터와 레이블이 없는 이미지 데이터를 모두 사용하는 프로토콜에 대한 최상의 검색 결과를 보여준다. 마지막으로, 비지도 학습 이미지 검색의 경우 지도 방식으로 미리 학습된 초기 값 없이도 SPQ를 사용하여 최상의 검색 점수를 얻었으며 시각적으로 유사한 이미지가 검색 결과로 성공적으로 검색되는 것을 관찰할 수 있다.Content-based image retrieval, which finds relevant images to a query from a huge database, is one of the fundamental tasks in the field of computer vision. Especially for conducting fast and accurate retrieval, Approximate Nearest Neighbor (ANN) search approaches represented by Hashing and Product Quantization (PQ) have been proposed to image retrieval community. Ever since neural network based deep learning has shown excellent performance in many computer vision tasks, both Hashing and product quantization-based image retrieval systems are also adopting deep learning for improvement. In this dissertation, image retrieval methods under various deep learning conditions are investigated to suggest the appropriate retrieval systems. Specifically, by considering the purpose of image retrieval, the supervised learning methods are proposed to develop the deep Hashing systems that retrieve semantically similar images, and the semi-supervised, unsupervised learning methods are proposed to establish the deep product quantization systems that retrieve both semantically and visually similar images. Moreover, by considering the characteristics of image retrieval database, the face image sets with numerous class categories, and the general image sets of one or more labeled images are separated to be explored when building a retrieval system. First, supervised learning with the semantic labels given to images is introduced to build a Hashing-based retrieval system. To address the difficulties of distinguishing face images, such as the inter-class similarities (similar appearance between different persons) and the intra-class variations (same person with different pose, facial expressions, illuminations), the identity label of each image is employed to derive the discriminative binary codes. To further develop the face image retrieval quality, Similarity Guided Hashing (SGH) scheme is proposed, where the self-similarity learning with multiple data augmentation results are employed during training. In terms of Hashing-based general image retrieval systems, Deep Hash Distillation (DHD) scheme is proposed, where the trainable hash proxy that presents class-wise representative is introduced to take advantage of supervised signals. Moreover, self-distillation scheme adapted for Hashing is utilized to improve general image retrieval performance by exploiting the potential of augmented data appropriately. Second, semi-supervised learning that utilizes both labeled and unlabeled image data is investigated to build a PQ-based retrieval system. Even if the supervised deep methods show excellent performance, they do not meet the expectations unless expensive label information is sufficient. Besides, there is a limitation that a tons of unlabeled image data is excluded from training. To resolve this issue, the vector quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network is proposed. A novel metric learning strategy that preserves semantic similarity between labeled data, and a entropy regularization term that fully exploits inherent potentials of unlabeled data are employed to improve the retrieval system. This solution increases the generalization capacity of the quantization network, which allows to overcome previous limitations. Lastly, to enable the network to perform a visually similar image retrieval on its own without any human supervision, unsupervised learning algorithm is explored. Although, deep supervised Hashing and PQ methods achieve the outstanding retrieval performances compared to the conventional methods by fully exploiting the label annotations, however, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, the deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner is proposed. A newly designed Cross Quantized Contrastive learning strategy is applied to jointly learn the PQ codewords and the deep visual representations by comparing individually transformed images (views). This allows to understand the image content and extract descriptive features so that the visually accurate retrieval can be performed. By conducting extensive image retrieval experiments on the benchmark datasets, the proposed methods are confirmed to yield the outstanding results under various evaluation protocols. For supervised face image retrieval, SGH achieves the best retrieval performance for both low and high resolution face image, and DHD also demonstrates its efficiency in general image retrieval experiments with the state-of-the-art retrieval performance. For semi-supervised general image retrieval, GPQ shows the best search results for protocols that use both labeled and unlabeled image data. Finally, for unsupervised general image retrieval, the best retrieval scores are achieved with SPQ even without supervised pre-training, and it can be observed that visually similar images are successfully retrieved as search results.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 1.1 Contribution 3 1.2 Contents 4 2 Supervised Learning for Deep Hashing: Similarity Guided Hashing for Face Image Retrieval / Deep Hash Distillation for General Image Retrieval 5 2.1 Motivation and Overview for Face Image Retrieval 5 2.1.1 Related Works 9 2.2 Similarity Guided Hashing 10 2.3 Experiments 16 2.3.1 Datasets and Setup 16 2.3.2 Results on Small Face Images 18 2.3.3 Results on Large Face Images 19 2.4 Motivation and Overview for General Image Retrieval 20 2.5 Related Works 22 2.6 Deep Hash Distillation 24 2.6.1 Self-distilled Hashing 24 2.6.2 Teacher loss 27 2.6.3 Training 29 2.6.4 Hamming Distance Analysis 29 2.7 Experiments 32 2.7.1 Setup 32 2.7.2 Implementation Details 32 2.7.3 Results 34 2.7.4 Analysis 37 3 Semi-supervised Learning for Product Quantization: Generalized Product Quantization Network for Semi-supervised Image Retrieval 42 3.1 Motivation and Overview 42 3.1.1 Related Work 45 3.2 Generalized Product Quantization 47 3.2.1 Semi-Supervised Learning 48 3.2.2 Retrieval 52 3.3 Experiments 53 3.3.1 Setup 53 3.3.2 Results and Analysis 55 4 Unsupervised Learning for Product Quantization: Self-supervised Product Quantization for Deep Unsupervised Image Retrieval 58 4.1 Motivation and Overview 58 4.1.1 Related Works 61 4.2 Self-supervised Product Quantization 62 4.2.1 Overall Framework 62 4.2.2 Self-supervised Training 64 4.3 Experiments 67 4.3.1 Datasets 67 4.3.2 Experimental Settings 68 4.3.3 Results 71 4.3.4 Empirical Analysis 71 5 Conclusion 75 Abstract (In Korean) 88박

SNU Open Repository and Archive

Orthonormal Product Quantization Network for Scalable Face Image Retrieval

Author: Yan Hong
Zhang Ming
Zhe Xuefei
Publication venue
Publication date: 01/07/2021
Field of study

Recently, deep hashing with Hamming distance metric has drawn increasing attention for face image retrieval tasks. However, its counterpart deep quantization methods, which learn binary code representations with dictionary-related distance metrics, have seldom been explored for the task. This paper makes the first attempt to integrate product quantization into an end-to-end deep learning framework for face image retrieval. Unlike prior deep quantization methods where the codewords for quantization are learned from data, we propose a novel scheme using predefined orthonormal vectors as codewords, which aims to enhance the quantization informativeness and reduce the codewords' redundancy. To make the most of the discriminative information, we design a tailored loss function that maximizes the identity discriminability in each quantization subspace for both the quantized and the original features. Furthermore, an entropy-based regularization term is imposed to reduce the quantization error. We conduct experiments on three commonly-used datasets under the settings of both single-domain and cross-domain retrieval. It shows that the proposed method outperforms all the compared deep hashing/quantization methods under both settings with significant superiority. The proposed codewords scheme consistently improves both regular model performance and model generalization ability, verifying the importance of codewords' distribution for the quantization quality. Besides, our model's better generalization ability than deep hashing models indicates that it is more suitable for scalable face image retrieval tasks

arXiv.org e-Print Archive

MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Author: Ainslie Joshua
Cohen William W.
de Jong Michiel
Ontañón Santiago
Sanghai Sumit
Vilnis Luke
Zemlyanskiy Yury
Publication venue
Publication date: 28/08/2023
Field of study

Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora

arXiv.org e-Print Archive

Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication

Author: Andri Renzo
Benini Luca
Cavigelli Lukas
Perotti Matteo
Schönleber Jannis
Publication venue
Publication date: 16/11/2023
Field of study

From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (LUT). Stella Nera is the first Maddness accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators implemented in the same technology. The hash function is a decision tree, which allows for an efficient hardware implementation as the multiply-accumulate operations are replaced by decision tree passes and LUT lookups. The entire Maddness MatMul can be broken down into parts that allow an effective implementation with small computing units and memories, allowing it to reach extreme efficiency while remaining generically applicable for MatMul tasks. In a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency of 161 TOp/s/[email protected] with a Top-1 accuracy on CIFAR-10 of more than 92.5% using ResNet9.Comment: 6 pages, 7 figures, preprint under revie

arXiv.org e-Print Archive

Multi-image classification and compression using vector quantization

Author: Thompson Beverly J.
Publication venue: W&M ScholarWorks
Publication date: 01/01/1999
Field of study

Vector Quantization (VQ) is an image processing technique based on statistical clustering, and designed originally for image compression. In this dissertation, several methods for multi-image classification and compression based on a VQ design are presented. It is demonstrated that VQ can perform joint multi-image classification and compression by associating a class identifier with each multi-spectral signature codevector. We extend the Weighted Bayes Risk VQ (WBRVQ) method, previously used for single-component images, that explicitly incorporates a Bayes risk component into the distortion measure used in the VQ quantizer design and thereby permits a flexible trade-off between classification and compression priorities. In the specific case of multi-spectral images, we investigate the application of the Multi-scale Retinex algorithm as a preprocessing stage, before classification and compression, that performs dynamic range compression, reduces the dependence on lighting conditions, and generally enhances apparent spatial resolution. The goals of this research are four-fold: (1) to study the interrelationship between statistical clustering, classification and compression in a multi-image VQ context; (2) to study mixed-pixel classification and combined classification and compression for simulated and actual, multispectral and hyperspectral multi-images; (3) to study the effects of multi-image enhancement on class spectral signatures; and (4) to study the preservation of scientific data integrity as a function of compression. In this research, a key issue is not just the subjective quality of the resulting images after classification and compression but also the effect of multi-image dimensionality on the complexity of the optimal coder design

College of William & Mary: W&M Publish

Dual Hybrid Coder for High Quality Image Compression

Author: Lee Hang-Chan
Publication venue: 'Oklahoma State University Library'
Publication date: 01/07/1997
Field of study

Electrical Engineerin

SHAREOK repository