6 research outputs found
Finding Consistent Clusters in Data Partitions
Abstract. Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to dependencies on initialization or the selection of the value of some design parameter. This paper addresses the problem of finding consistent clusters in data partitions, proposing the analysis of the most common associations performed in a majority voting scheme. Combination of clustering results are performed by transforming data partitions into a co-association sample matrix, which maps coherent associations. This matrix is then used to extract the underlying consistent clusters. The proposed methodology is evaluated in the context of k-means clustering, a new clustering algorithm- voting-k-means, being presented. Examples, using both simulated and real data, show how this majority voting combination scheme simultaneously handles the problems of selecting the number of clusters, and dependency on initialization. Furthermore, resulting clusters are not constrained to be hyper-spherically shaped.
Evidence Accumulation Clustering based on the K-Means Algorithm
The idea of evidence accumulation for the combination of multiple clusterings was recently proposed [7]. Taking the K-means as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern association is accumulated, by a voting mechanism, over multiple clusterings obtained by random initializations of the K-means algorithm. This produces a mapping of the clusterings into a new similarity measure between patterns. The final data partition is obtained by applying the single-link method over this similarity matrix. In this paper we further explore and extend this idea, by proposing: (a) the combination of multiple K-means clusterings using variable k; (b) using cluster lifetime as the criterion for extracting the final clusters; and (c) the adaptation of this approach to string patterns
Piecewise linear classifiers based on nonsmooth optimization approaches
Nonsmooth optimization provides efficient algorithms for solving many machine learning problems. In particular, nonsmooth optimization approaches to supervised data classification problems lead to the design of very efficient algorithms for their solution. In this chapter, we demonstrate how nonsmooth optimization algorithms can be applied to design efficient piecewise linear classifiers for supervised data classification problems. Such classifiers are developed using a max–min and a polyhedral conic separabilities as well as an incremental approach. We report results of numerical experiments and compare the piecewise linear classifiers with a number of other mainstream classifiers