Search CORE

4 research outputs found

Global optimality in k-means clustering

Author: Aloise
Arthur
Awasthi
Bai
Basu
Canny
Chen
Cristina Tîrnăucă
Dasgupta
Domingo Gómez-Pérez
Drineas
Edelsbrunner
Goodman
Hasegawa
Inaba
Jensen
José L. Balcázar
José L. Montaña
Kahraman
Kanungo
Le Guernic
Likas
Lloyd
MacQueen
Mahajan
Ostrovsky
Pakrashi
Pelleg
Rouillier
Russell
Sabo
Wang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Abstract: We study the problem of finding an optimum clustering, a problem known to be NP-hard. Existing literature contains algorithms running in time proportional to the number of points raised to a power that depends on the dimensionality and on the number of clusters. Published validations of some of these algorithms are unfortunately incomplete; besides, the constant factors (with respect to the number of points) in their running time bounds have seen several published important improvements but are still huge, exponential on the dimension and on the number of clusters, making the corresponding algorithms fully impractical. We provide a new algorithm, with its corresponding complexity-theoretic analysis. It reduces both the exponent and the constant factor, to the extent that it becomes feasible for relevant particular cases. Additionally, it parallelizes extremely well, so that its implementation on current high-performance hardware is quite straightforward. Our proposal opens the door to potential improvements along a research line that had no practical significance so far; besides, a long but single-shot run of our algorithm allows one to identify absolutely optimum solutions for benchmark problems, whereby alternative heuristic proposals can evaluate the goodness of their solutions and the precise price paid for their faster running times

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UCrea

UPCommons. Portal del coneixement obert de la UPC

U-Control Chart Based Differential Evolution Clustering for Determining the Number of Cluster in k-Means

Author: AP Piotrowski
C Tîrnăucă
C Vásquez
G Dobbie
I Kaya
M Torres-Samuel
M Torres-Samuel
MA Rahman
MA Rahman
S Bandyopadhyay
S Chakraborty
S Das
SB Salem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/04/2019
Field of study

The automatic clustering differential evolution (ACDE) is one of the clustering methods that are able to determine the cluster number automatically. However, ACDE still makes use of the manual strategy to determine k activation threshold thereby affecting its performance. In this study, the ACDE problem will be ameliorated using the u-control chart (UCC) then the cluster number generated from ACDE will be fed to k-means. The performance of the proposed method was tested using six public datasets from the UCI repository about academic efficiency (AE) and evaluated with Davies Bouldin Index (DBI) and Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches

Crossref

Repositorio Digital CUC

Klaszterszám-meghatározási módszerek összehasonlítása

Author: Szüle Borbála
Publication venue: 'Statisztikai Szemle'
Publication date: 01/01/2019
Field of study

Repository of the Academy's Library

Global optimality in k-means clustering

Author: Balcázar Navarro José Luis
Gómez Pérez Domingo
Montaña Arnaiz José Luis
Tîrnauca Cristina
Publication venue
Publication date
Field of study

We study the problem of finding an optimum clustering, a problem known to be NP-hard. Existing literature contains algorithms running in time proportional to the number of points raised to a power that depends on the dimensionality and on the number of clusters. Published validations of some of these algorithms are unfortunately incomplete; besides, the constant factors (with respect to the number of points) in their running time bounds have seen several published important improvements but are still huge, exponential on the dimension and on the number of clusters, making the corresponding algorithms fully impractical. We provide a new algorithm, with its corresponding complexity-theoretic analysis. It reduces both the exponent and the constant factor, to the extent that it becomes feasible for relevant particular cases. Additionally, it parallelizes extremely well, so that its implementation on current high-performance hardware is quite straightforward. Our proposal opens the door to potential improvements along a research line that had no practical significance so far; besides, a long but single-shot run of our algorithm allows one to identify absolutely optimum solutions for benchmark problems, whereby alternative heuristic proposals can evaluate the goodness of their solutions and the precise price paid for their faster running times.Peer Reviewe

RECERCAT