Search CORE

122,894 research outputs found

K

Author: Cheng Lu
Cheng Wu
Shiji Song
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

The Affinity Propagation (AP) algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based on K-nearest neighbor intervals (KNNI) for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results

Crossref

Directory of Open Access Journals

Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

Author: Alguwaizani
Bai
Bai
Bai
Barbara
Bradley
Cao
Cao
Cao
Changhao Huang
Chen
Chen
Franceschi
Frossyniotis
Gan
Ganti
Gilpin
Guha
Gupta
Hansen
Hansen
Hansen
He
Helber
Huang
Ikou Kaku
Jain
Jiang
Jiaoying Huang
Kao
Kaufman
Khan
Khan
Kim
MacQueen
Mladenovic
Mladenović
Mueller
Myhre
Ng
Parmar
Qin
Ralambondrainy
Saha
Sun
Wu
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Yiyong Xiao
Yuchun Xu
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/06/2019
Field of study

The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results

Crossref

Aston Publications Explorer

Clustering Partially Observed Graphs via Convex Optimization

Author: Chen Yudong
Jalali Ali
Sanghavi Sujay
Xu Huan
Publication venue
Publication date: 01/01/2014
Field of study

This paper considers the problem of clustering a partially observed unweighted graph---i.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"---i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning Research (JMLR). Partial results appeared in International Conference on Machine Learning (ICML) 201

arXiv.org e-Print Archive

CiteSeerX