Search CORE

23 research outputs found

Clustering with diversity

Author: Li Jian
Yi Ke
Zhang Qin
Publication venue
Publication date: 01/01/2010
Field of study

We consider the {\em clustering with diversity} problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least

\ell

points, all of which have distinct colors. We give a 2-approximation to this problem for any

\ell

when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless

\mathbf{P=NP}

, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.Comment: Extended abstract accepted in ICALP 2010. Keywords: Approximation algorithm, k-center, k-anonymity, l-diversit

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Capacitated Center Problems with Two-Sided Bounds and Outliers

Author: CG Fernandes
DS Hochbaum
DZ Chen
G Aggarwal
HC An
J Barilan
J Li
K Jain
L Sweeney
M Charikar
MR Korupolu
S Guha
S Khuller
S Li
S Li
TF Gonzalez
V Arya
Publication venue
Publication date: 23/02/2017
Field of study

In recent years, the capacitated center problems have attracted a lot of research interest. Given a set of vertices

V

, we want to find a subset of vertices

S

, called centers, such that the maximum cluster radius is minimized. Moreover, each center in

S

should satisfy some capacity constraint, which could be an upper or lower bound on the number of vertices it can serve. Capacitated

k

-center problems with one-sided bounds (upper or lower) have been well studied in previous work, and a constant factor approximation was obtained. We are the first to study the capacitated center problem with both capacity lower and upper bounds (with or without outliers). We assume each vertex has a uniform lower bound and a non-uniform upper bound. For the case of opening exactly

k

centers, we note that a generalization of a recent LP approach can achieve constant factor approximation algorithms for our problems. Our main contribution is a simple combinatorial algorithm for the case where there is no cardinality constraint on the number of open centers. Our combinatorial algorithm is simpler and achieves better constant approximation factor compared to the LP approach

arXiv.org e-Print Archive

Crossref

Matroid and Knapsack Center Problems

Author: A. Schrijver
D. Hochbaum
D.Z. Chen
F. Grandoni
G.N. Frederickson
J. Chuzhoy
J. Edmonds
J. Li
M. Charikar
M. Hajiaghayi
R. Cole
R. Matthew McCutchen
S. Chechik
S. Khuller
S. Khuller
T. Gonzalez
Publication venue
Publication date: 01/01/2013
Field of study

In the classic

k

-center problem, we are given a metric graph, and the objective is to open

k

nodes as centers such that the maximum distance from any vertex to its closest center is minimized. In this paper, we consider two important generalizations of

k

-center, the matroid center problem and the knapsack center problem. Both problems are motivated by recent content distribution network applications. Our contributions can be summarized as follows: 1. We consider the matroid center problem in which the centers are required to form an independent set of a given matroid. We show this problem is NP-hard even on a line. We present a 3-approximation algorithm for the problem on general metrics. We also consider the outlier version of the problem where a given number of vertices can be excluded as the outliers from the solution. We present a 7-approximation for the outlier version. 2. We consider the (multi-)knapsack center problem in which the centers are required to satisfy one (or more) knapsack constraint(s). It is known that the knapsack center problem with a single knapsack constraint admits a 3-approximation. However, when there are at least two knapsack constraints, we show this problem is not approximable at all. To complement the hardness result, we present a polynomial time algorithm that gives a 3-approximate solution such that one knapsack constraint is satisfied and the others may be violated by at most a factor of

1+\epsilon

. We also obtain a 3-approximation for the outlier version that may violate the knapsack constraint by

1+\epsilon

.Comment: A preliminary version of this paper is accepted to IPCO 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

On the Cost of Essentially Fair Clusterings

Author: Bercea Ioana O.
Khuller Samir
Kumar Aounon
Schmidt Daniel R.
Schmidt Melanie
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 26/11/2018
Field of study

Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair

k

-center problem and a

O(t)

-approximation for the fair

k

-median problem, where

t

is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair

k

-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair

k

-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives

k

-center,

k

-supplier,

k

-median,

k

-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Dagstuhl Research Online Publication Server

Diversity-based Attribute Weighting for K-modes Clustering

Author: Hayun D. R. (Dian)
Huda M. M. (Muhammad)
Indarwanti A. S. (Annisaa)
Publication venue: Indonesian Society for Soft Computing
Publication date: 01/01/2014
Field of study

Categorical data is a kind of data that is used for computational in computer science. To obtain the information from categorical data input, it needs a clustering algorithm. There are so many clustering algorithms that are given by the researchers. One of the clustering algorithms for categorical data is k-modes. K-modes uses a simple matching approach. This simple matching approach uses similarity values. In K-modes, the two similar objects have similarity value 1, and 0 if it is otherwise. Actually, in each attribute, there are some kinds of different attribute value and each kind of attribute value has different number. The similarity value 0 and 1 is not enough to represent the real semantic distance between a data object and a cluster. Thus in this paper, we generalize a k-modes algorithm for categorical data by adding the weight and diversity value of each attribute value to optimize categorical data clustering

Neliti

Directory of Open Access Journals

Jurnal Ilmu Komputer dan Informasi

Resource-efficient fast prediction in healthcare data analytics: A pruned Random Forest regression approach

Author: Fawagreh Khaled
Gaber Mohamed Medhat
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/01/2020
Field of study

In predictive healthcare data analytics, high accuracy is both vital and paramount as low accuracy can lead to misdiagnosis, which is known to cause serious health consequences or death. Fast prediction is also considered an important desideratum particularly for machines and mobile devices with limited memory and processing power. For real-time health care analytics applications, particularly the ones that run on mobile devices, such traits (high accuracy and fast prediction) are highly desirable. In this paper, we propose to use an ensemble regression technique based on CLUB-DRF, which is a pruned Random Forest that possesses these features. The speed and accuracy of the method have been demonstrated by an experimental study on three medical data sets of three different diseases

Birmingham City University Open Access Repository

BCU Open Access

Privacy Preserving Clustering with Constraints

Author: Schmidt Melanie
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

The k-center problem is a classical combinatorial optimization problem which asks to find k centers such that the maximum distance of any input point in a set P to its assigned center is minimized. The problem allows for elegant 2-approximations. However, the situation becomes significantly more difficult when constraints are added to the problem. We raise the question whether general methods can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that respects one constraint more. Our constraint of choice is privacy: Here, we are asked to only open a center when at least l clients will be assigned to it. We show how to combine privacy with several other constraints

Dagstuhl Research Online Publication Server