Search CORE

1,250 research outputs found

The larger the better: Analysis of a scalable spectral clustering algorithm with cosine similarity

Author: Chen Guangliang
Publication venue: 'IOS Press'
Publication date: 29/10/2021
Field of study

Chen (2018) proposed a scalable spectral clustering algorithm for cosine similarity to handle the task of clustering large data sets. It runs extremely fast, with a linear complexity in the size of the data, and achieves state of the art accuracy. This paper conducts perturbation analysis of the algorithm to understand the effect of discarding a perturbation term in an eigendecomposition step. Our results show that the accuracy of the approximation by the scalable algorithm depends on the connectivity of the clusters, their separation and sizes, and is especially accurate for large data sets

SJSU ScholarWorks

On The Memory Scalability of Spectral Clustering Algorithms

Author: Li Ran
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2023
Field of study

Spectral clustering has lots of advantages compared to previous more traditional clustering methods, such as k-means and Gaussian Mixture Models (GMM), and is popular since it was introduced. However, there are two major challenges, speed scalability and memory scalability, that impede the wide applications of spectral clustering. The first challenge has been addressed recently by Chen [1] [2] in the special setting of sparse or low dimensional data sets. In this work, we will first review the recent study by Chen that speeds up spectral clustering. Then we will propose three new computational methods for the same special setting of sparse or low dimensional data to address the memory challenge when the data sets are too large to be fully loaded into computer memory and when the data sets are collected sequentially. Numerical experiment results will be presented to demonstrate the improvements from these methods. Based on the experiments, the proposed methods show effective results on both simulated and real-world data

SJSU ScholarWorks

Weighted adjacent matrix for K-means clustering

Author: Liu T
Zhou J
Zhu J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2019
Field of study

CAUL read and publish agreement 2022Publishe

Massey Research Online

Scalable Image Retrieval by Sparse Product Quantization

Author: Chen Chun
Hoi Steven C. H.
Ning Qingqun
Zhong Zhiyuan
Zhu Jianke
Publication venue
Publication date: 15/03/2016
Field of study

Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional feature indexing and retrieval is the crux of large-scale image retrieval. A recent promising technique is Product Quantization, which attempts to index high-dimensional image features by decomposing the feature space into a Cartesian product of low dimensional subspaces and quantizing each of them separately. Despite the promising results reported, their quantization approach follows the typical hard assignment of traditional quantization methods, which may result in large quantization errors and thus inferior search performance. Unlike the existing approaches, in this paper, we propose a novel approach called Sparse Product Quantization (SPQ) to encoding the high-dimensional feature vectors into sparse representation. We optimize the sparse representations of the feature vectors by minimizing their quantization errors, making the resulting representation is essentially close to the original data in practice. Experiments show that the proposed SPQ technique is not only able to compress data, but also an effective encoding technique. We obtain state-of-the-art results for ANN search on four public image datasets and the promising results of content-based image retrieval further validate the efficacy of our proposed method.Comment: 12 page

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University