Search CORE

2,675 research outputs found

Efficient Information Theoretic Clustering on Discrete Lattices

Author: Bauckhage Christian
Kersting Kristian
Publication venue
Publication date: 26/10/2013
Field of study

We consider the problem of clustering data that reside on discrete, low dimensional lattices. Canonical examples for this setting are found in image segmentation and key point extraction. Our solution is based on a recent approach to information theoretic clustering where clusters result from an iterative procedure that minimizes a divergence measure. We replace costly processing steps in the original algorithm by means of convolutions. These allow for highly efficient implementations and thus significantly reduce runtime. This paper therefore bridges a gap between machine learning and signal processing.Comment: This paper has been presented at the workshop LWA 201

arXiv.org e-Print Archive

CiteSeerX

Fraunhofer-ePrints

An Enhanced Initialization Method to Find an Initial Center for K-modes Clustering

Author: S. Saranya, Dr.P.Jayanthi
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 26/02/2017
Field of study

Data mining is a technique which extracts the information from the large amount of data. To group the objects having similar characteristics, clustering method is used. K-means clustering algorithm is very efficient for large data sets deals with numerical quantities however it not works well for real world data sets which contain categorical values for most of the attributes. K-modes algorithm is used in the place of K-means algorithm. In the existing system, the initialization of K- modes clustering from the view of outlier detection is considered. It avoids that various initial cluster centers come from the same cluster. To overcome the above said limitation, it uses Initial_Distance and Initial_Entropy algorithms which use a new weightage formula to calculate the degree of outlierness of each object. K-modes algorithm can guarantee that the chosen initial cluster centers are not outliers. To improve the performance further, a new modified distance metric -weighted matching distance is used to calculate the distance between two objects during the process of initialization. As well as, one of the data pre-processing methods is used to improve the quality of data. Experiments are carried out on several data sets from UCI repository and the results demonstrated the effectiveness of the initialization method in the proposed algorithm

International Journal on Recent and Innovation Trends in Computing and Communication

Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

Author: Alguwaizani
Bai
Bai
Bai
Barbara
Bradley
Cao
Cao
Cao
Changhao Huang
Chen
Chen
Franceschi
Frossyniotis
Gan
Ganti
Gilpin
Guha
Gupta
Hansen
Hansen
Hansen
He
Helber
Huang
Ikou Kaku
Jain
Jiang
Jiaoying Huang
Kao
Kaufman
Khan
Khan
Kim
MacQueen
Mladenovic
Mladenović
Mueller
Myhre
Ng
Parmar
Qin
Ralambondrainy
Saha
Sun
Wu
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Yiyong Xiao
Yuchun Xu
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/06/2019
Field of study

The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results

Crossref

Aston Publications Explorer