Search CORE

323,791 research outputs found

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Author: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 10/09/2012
Field of study

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref

An overview of clustering methods with guidelines for application in mental health research

Author: Gao Caroline X.
Publication venue: Universidad de Granada
Publication date: 27/05/2023
Field of study

Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

Repositorio Institucional Universidad de Granada

Document Clustering

Author: Aggarwal
Ailon
Anastasiu
Andoni
Baudat
Beeferman
Bengio
Bengio
Blei
Blei
Businger
Businger
Collobert
Deerwester
Du
Dy
Fisher
Goodfellow
Hearst
Hinton
Hofmann
Hornik
Hotelling
Jain
Jolliffe
Luxburg
MacQueen
Manning
Martínez
Porter
Rosen-Zvi
Salton
Vincent
Zahn
Zhao
Zhong
Zipf
Zuo
Publication venue: SJSU ScholarWorks
Publication date: 15/11/2017
Field of study

In a world flooded with information, document clustering is an important tool that can help categorize and extract insight from text collections. It works by grouping similar documents, while simultaneously discriminating between groups. In this article, we provide a brief overview of the principal techniques used to cluster documents, and introduce a series of novel deep-learning based methods recently designed for the document clustering task. In our overview, we point the reader to salient works that can provide a deeper understanding of the topics discussed

Crossref

Scholar Commons - Santa Clara University

SJSU ScholarWorks

Median topographic maps for biomedical data sets

Author: Biehl M.
Hammer B.
Hammer Barbara
Hasenfuss A.
Rossi F.
Verleysen M.
Villmann T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Median clustering extends popular neural data analysis methods such as the self-organizing map or neural gas to general data structures given by a dissimilarity matrix only. This offers flexible and robust global data inspection methods which are particularly suited for a variety of data as occurs in biomedical domains. In this chapter, we give an overview about median clustering and its properties and extensions, with a particular focus on efficient implementations adapted to large scale data analysis

arXiv.org e-Print Archive

CiteSeerX

Publications at Bielefeld University

A Comparative Study Of Fuzzy C-Means And K-Means Clustering Techniques

Author: Sharifah Sakinah Syed Ahmad
Publication venue
Publication date: 01/11/2014
Field of study

Clustering analysis has been considered as a useful means for identifying patterns in dataset. The aim for this paper is to propose a comparison study between two well-known clustering algorithms namely fuzzy c-means (FCM) and k-means. First we present an overview of both methods with emphasis on the implementation of the algorithm. Then, we apply six datasets to measure the quality of clustering result based on the similarity measure used in the algorithm and its representation of clustering result. Next, we also optimize the fuzzification variable, m in FCM algorithm in order to improve the clustering performance. Finally we compare the performance of the experimental result for both method

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Multidimensional clustering approaches for pareto-frontiers

Author: Endres Markus
Kastner Johannes
Publication venue
Publication date: 06/06/2017
Field of study

In Data Mining large and increasing sets of data are becoming more and more common. In order to avoid losing the overview on these data-sets, preference queries are a very popular method to reduce quantities of data to high relevant information. Together with clustering methods like k-means, confusing sets of objects can be constituted and presented clearer in order to get a better overview. In this report we present on the one hand the Pareto-dominance as a very suitable and promising approach to cluster objects over better-than relationships. In order to meet someones desires, one can tip the balance of the final results to the more favored dimension if no decision for allocating objects is possible. On the other hand we introduce based on the Pareto-dominance an advanced clustering approach exploiting the Borda Social Choice voting rule to manage distances of different domains by equally weights during the clustering process

OPUS Augsburg

Utility-driven assessment of anonymized data via clustering

Author: Fazendeiro Paulo
Ferrão Maria Eugénia
Prata Paula
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2022
Field of study

In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.info:eu-repo/semantics/publishedVersio

UBibliorum repositorio digital da ubi

PubMed Central