Soft cluster ensembles

Abstract

Cluster Ensembles is a framework for combining multiple partitionings obtained from separate clustering runs into a final consensus clustering without accessing the original features of the data or the algorithms that determined these partitions. This framework was first proposed by Strehl and Ghosh [31] who also provided three techniques to solve the problem. Since then there have been numerous attempts to solve cluster ensembles using approaches such as Maximum Likelihood using EM, Bipartite Graph Partitioning, Genetic algorithms, and Voting-Merging. Most of this work has focused on devising approaches that aceept hard clusterings as input. Also, there has been no comparison of combining accuracy on soft vs hard cluster ensembles. In this thesis we will show experimentally as well as intuitively that using soft clusterings as input does offer signficant advantages, especially when dealing with vertically partitioned data. We modify many of the above mentioned algorithms to accept soft clusterings and experiment over multiple real-life datasetsElectrical and Computer Engineerin

    Similar works

    Full text

    thumbnail-image