Search CORE

42 research outputs found

The Power of Uniform Sampling for Coresets

Author: Braverman Vladimir
Cohen-Addad Vincent
Jiang Shaofeng H. -C.
Krauthgamer Robert
Schwiegelshohn Chris
Toftrup Mads Bech
Wu Xuan
Publication venue
Publication date: 17/09/2022
Field of study

Motivated by practical generalizations of the classic

k

-median and

k

-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of

n

, the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering, fair clustering, Euclidean Wasserstein barycenter, clustering in minor-excluded graph, and polygon clustering under Fr\'{e}chet and Hausdorff distance. Finally, our technique yields also smaller coresets for

1

-median in low-dimensional Euclidean spaces, specifically of size

\tilde{O}(\varepsilon^{-1.5})

\mathbb{R}^2

and

\tilde{O}(\varepsilon^{-1.6})

\mathbb{R}^3

arXiv.org e-Print Archive

Spectral Clustering with Imbalanced Data

Author: Qian Jing
Saligrama Venkatesh
Publication venue
Publication date: 09/09/2013
Field of study

Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced data. Our approach parameterizes a family of graphs, by adaptively modulating node degrees on a fixed node set, to yield a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach. We demonstrate the superiority of our method through unsupervised and semi-supervised experiments on synthetic and real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1302.513

arXiv.org e-Print Archive

Crossref

Clustering and Community Detection with Imbalanced Clusters

Author: Aksoylar Cem
Qian Jing
Saligrama Venkatesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/08/2016
Field of study

Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Clustering with diversity

Author: Li Jian
Yi Ke
Zhang Qin
Publication venue
Publication date: 01/01/2010
Field of study

We consider the {\em clustering with diversity} problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least

\ell

points, all of which have distinct colors. We give a 2-approximation to this problem for any

\ell

when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless

\mathbf{P=NP}

, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.Comment: Extended abstract accepted in ICALP 2010. Keywords: Approximation algorithm, k-center, k-anonymity, l-diversit

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Graph Cuts with Arbitrary Size Constraints Through Optimal Transport

Author: Fettal Chakib
Labiod Lazhar
Nadif Mohamed
Publication venue
Publication date: 07/02/2024
Field of study

A common way of partitioning graphs is through minimum cuts. One drawback of classical minimum cut methods is that they tend to produce small groups, which is why more balanced variants such as normalized and ratio cuts have seen more success. However, we believe that with these variants, the balance constraints can be too restrictive for some applications like for clustering of imbalanced datasets, while not being restrictive enough for when searching for perfectly balanced partitions. Here, we propose a new graph cut algorithm for partitioning graphs under arbitrary size constraints. We formulate the graph cut problem as a regularized Gromov-Wasserstein problem. We then propose to solve it using accelerated proximal GD algorithm which has global convergence guarantees, results in sparse solutions and only incurs an additional ratio of

\mathcal{O}(\log(n))

compared to the classical spectral clustering algorithm but was seen to be more efficient

arXiv.org e-Print Archive

Data clustering with cluster size constraints using a modified k-means algorithm

Author: Cheng CT
Ganganath N
Tse CK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/06/2016
Field of study

2014-2015 > Academic research: refereed > Refereed conference paperAccepted ManuscriptPublishe

PolyU Institutional Repository