Search CORE

112 research outputs found

Coresets for Fuzzy K-Means with Applications

Author: Brauer Sascha
Bujna Kathrin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th International Symposium on Algorithms and Computation (ISAAC 2018)
Publication date: 01/01/2018
Field of study

The fuzzy K-means problem is a popular generalization of the well-known K-means problem to soft clusterings. We present the first coresets for fuzzy K-means with size linear in the dimension, polynomial in the number of clusters, and poly-logarithmic in the number of points. We show that these coresets can be employed in the computation of a (1+epsilon)-approximation for fuzzy K-means, improving previously presented results. We further show that our coresets can be maintained in an insertion-only streaming setting, where data points arrive one-by-one

Dagstuhl Research Online Publication Server

Fast Color Quantization Using Weighted Sort-Means Clustering

Author: Balasubramanian
Bing
Chang
Cheng
Dekker
Deng
Deng
Drineas
Equitz
Forgy
Gentile
Heckbert
Hu
Hu
Huang
Joy
Kanjanawanishkul
Kanungo
Kasuga
Kolen
Kuo
Linde
Lloyd
M. Emre Celebi
Orchard
Ozdemir
Papamarkos
Schaefer
Scheunders
Sirisathitkul
Wan
Xiang
Xiang
Yang
Yang
Publication venue: 'The Optical Society'
Publication date: 01/01/2009
Field of study

Color quantization is an important operation with numerous applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. However, despite its popularity as a general purpose clustering algorithm, k-means has not received much respect in the color quantization literature because of its high computational requirements and sensitivity to initialization. In this paper, a fast color quantization method based on k-means is presented. The method involves several modifications to the conventional (batch) k-means algorithm including data reduction, sample weighting, and the use of triangle inequality to speed up the nearest neighbor search. Experiments on a diverse set of images demonstrate that, with the proposed modifications, k-means becomes very competitive with state-of-the-art color quantization methods in terms of both effectiveness and efficiency.Comment: 30 pages, 2 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Coresets for minimum enclosing balls over sliding windows

Author: LI Yuchen
TAN Kian-Lee
WANG Yanhao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2019
Field of study

\emph{Coresets} are important tools to generate concise summaries of massive datasets for approximate analysis. A coreset is a small subset of points extracted from the original point set such that certain geometric properties are preserved with provable guarantees. This paper investigates the problem of maintaining a coreset to preserve the minimum enclosing ball (MEB) for a sliding window of points that are continuously updated in a data stream. Although the problem has been extensively studied in batch and append-only streaming settings, no efficient sliding-window solution is available yet. In this work, we first introduce an algorithm, called AOMEB, to build a coreset for MEB in an append-only stream. AOMEB improves the practical performance of the state-of-the-art algorithm while having the same approximation ratio. Furthermore, using AOMEB as a building block, we propose two novel algorithms, namely SWMEB and SWMEB+, to maintain coresets for MEB over the sliding window with constant approximation ratios. The proposed algorithms also support coresets for MEB in a reproducing kernel Hilbert space (RKHS). Finally, extensive experiments on real-world and synthetic datasets demonstrate that SWMEB and SWMEB+ achieve speedups of up to four orders of magnitude over the state-of-the-art batch algorithm while providing coresets for MEB with rather small errors compared to the optimal ones.Comment: 28 pages, 10 figures, to appear in The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '19

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Coresets for Time Series Clustering

Author: huang Lingxiao
Sudhir K.
Vishnoi Nisheeth
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 28/10/2021
Field of study

We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many ﬁelds including biology, medicine, and economics due to the proliferation of sensors for real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on N entities is generated from a Gaussian mixture model with autocorrelations over k clusters in Rd. Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is eﬀicient, and, under a mild assumption on the covariance matrices of the Gaussians, the size of the coreset is independent of the number of entities N and the number of observations for each entity, and depends only polynomially on k, d and 1/ε, where ε is the error parameter. We empirically assess the performance of our coresets with synthetic data

arXiv.org e-Print Archive

Yale University