Search CORE

187,836 research outputs found

Global Optimization strategies for two-mode clustering

Author: Castilli W.
Groenen P.J.F.
Rosmalen J.M. van
Trejos J.
Publication venue
Publication date
Field of study

Two-mode clustering is a relatively new form of clustering that clusters both rows and columns of a data matrix. To do so, a criterion similar to k-means is optimized. However, it is still unclear which optimization method should be used to perform two-mode clustering, as various methods may lead to non-global optima. This paper reviews and compares several optimization methods for two-mode clustering. Several known algorithms are discussed and a new, fuzzy algorithm is introduced. The meta-heuristics Multistart, Simulated Annealing, and Tabu Search are used in combination with these algorithms. The new, fuzzy algorithm is based on the fuzzy c-means algorithm of Bezdek (1981) and the Fuzzy Steps approach to avoid local minima of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performance of all methods is compared in a large simulation study. It is found that using a Multistart meta-heuristic in combination with a two-mode k-means algorithm or the fuzzy algorithm often gives the best results. Finally, an empirical data set is used to give a practical example of two-mode clustering.algorithms;fuzzy clustering;multistart;simulated annealing;simulation;tabu search;two-mode clustering

Research Papers in Economics

Meta Clustering

Author: Caruana Rich
Elhawary Mohamed
Nguyen Nam
Smith Casey
Publication venue: 'SAGE Publications'
Publication date: 29/09/2006
Field of study

Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for one optimal clustering based on a pre-specified clustering criterion. Once that clustering has been determined, no further clusterings are examined. Our approach differs in that we search for many alternate reasonable clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Any reasonable partitioning of the data is potentially useful for some purpose, regardless of whether or not it is optimal according to a specific clustering criterion. Our approach first finds a variety of reasonable clusterings. It then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. In this paper, we present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems, and then apply meta clustering to two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings

eCommons@Cornell

Inferring meta-covariates in classification

Author: B. Hanczar
C. Fraley
C.M. Bishop
K. Bae
K.E. Lee
M.Y. Park
S. Dudoit
T.R. Golub
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper develops an alternative method for gene selection that combines model based clustering and binary classification. By averaging the covariates within the clusters obtained from model based clustering, we define “meta-covariates” and use them to build a probit regression model, thereby selecting clusters of similarly behaving genes, aiding interpretation. This simultaneous learning task is accomplished by an EM algorithm that optimises a single likelihood function which rewards good performance at both classification and clustering. We explore the performance of our methodology on a well known leukaemia dataset and use the Gene Ontology to interpret our results

Crossref

UCL Discovery

Enlighten

On Graph Stream Clustering with Side Information

Author: Yu Philip S.
Zhao Yuchen
Publication venue
Publication date: 28/01/2013
Field of study

Graph clustering becomes an important problem due to emerging applications involving the web, social networks and bio-informatics. Recently, many such applications generate data in the form of streams. Clustering massive, dynamic graph streams is significantly challenging because of the complex structures of graphs and computational difficulties of continuous data. Meanwhile, a large volume of side information is associated with graphs, which can be of various types. The examples include the properties of users in social network activities, the meta attributes associated with web click graph streams and the location information in mobile communication networks. Such attributes contain extremely useful information and has the potential to improve the clustering process, but are neglected by most recent graph stream mining techniques. In this paper, we define a unified distance measure on both link structures and side attributes for clustering. In addition, we propose a novel optimization framework DMO, which can dynamically optimize the distance metric and make it adapt to the newly received stream data. We further introduce a carefully designed statistics SGS(C) which consume constant storage spaces with the progression of streams. We demonstrate that the statistics maintained are sufficient for the clustering process as well as the distance optimization and can be scalable to massive graphs with side attributes. We will present experiment results to show the advantages of the approach in graph stream clustering with both links and side information over the baselines.Comment: Full version of SIAM SDM 2013 pape

arXiv.org e-Print Archive

Crossref