14 research outputs found
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
A Novel Method to Control the Diversity in Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. Although disagreement among ensemble partitions (diversity) has been found to be fundamental for success, the literature has arrived to confusing conclusions: some authors suggest that high diversity is beneficial for the final performance, whereas others have indicated that medium is better. While there are several options to measure the diversity, there is no method to control it. This paper introduces a new ensemble generation strategy and a method to smoothly change the ensemble diversity.
Experimental results on three datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble diversity on the overall consensus performance.Sociedad Argentina de Informática e Investigación Operativ
A Novel Method to Control the Diversity in Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. Although disagreement among ensemble partitions (diversity) has been found to be fundamental for success, the literature has arrived to confusing conclusions: some authors suggest that high diversity is beneficial for the final performance, whereas others have indicated that medium is better. While there are several options to measure the diversity, there is no method to control it. This paper introduces a new ensemble generation strategy and a method to smoothly change the ensemble diversity.
Experimental results on three datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble diversity on the overall consensus performance.Sociedad Argentina de Informática e Investigación Operativ
Denise: Deep Learning based Robust PCA for Positive Semidefinite Matrices
The robust PCA of high-dimensional matrices plays an essential role when
isolating key explanatory features. The currently available methods for
performing such a low-rank plus sparse decomposition are matrix specific,
meaning, the algorithm must re-run each time a new matrix should be decomposed.
Since these algorithms are computationally expensive, it is preferable to learn
and store a function that instantaneously performs this decomposition when
evaluated. Therefore, we introduce Denise, a deep learning-based algorithm for
robust PCA of symmetric positive semidefinite matrices, which learns precisely
such a function. Theoretical guarantees that Denise's architecture can
approximate the decomposition function, to arbitrary precision and with
arbitrarily high probability, are obtained. The training scheme is also shown
to convergence to a stationary point of the robust PCA's loss-function. We
train Denise on a randomly generated dataset, and evaluate the performance of
the DNN on synthetic and real-world covariance matrices. Denise achieves
comparable results to several state-of-the-art algorithms in terms of
decomposition quality, but as only one evaluation of the learned DNN is needed,
Denise outperforms all existing algorithms in terms of computation time
FARKLI BAĞLANTI YÖNTEMLERİ İLE HİYERARŞİK KÜMELEME TOPLULUĞU
Kümeleme topluluğu, yüksek kümeleme performansı sağlaması nedeniyle son yıllarda tercih edilen bir teknik haline gelmiştir. Bu çalışmada, Bağlantı-tabanlı Hiyerarşik Kümeleme Topluluğu (BHKT) olarak isimlendirilen yeni bir yaklaşım önerilmektedir. Önerilen yaklaşımda, topluluk elemanları farklı bağlantı yöntemleri kullanarak hiyerarşik kümeleme yapmakta ve sonrasında çoğunluk oylaması ile ortak karar üretmektedir. Çalışmada kullanılan bağlantı yöntemleri: tek bağlantı, tam bağlantı, ortalama bağlantı, merkez bağlantı, Ward yöntemi, komşu birleştirme yöntemi ve ayarlı tam bağlantıdır. Ayrıca çalışmada, farklı boyutlardaki hiyerarşik kümeleme toplulukları incelenmiş ve birbiriyle karşılaştırılmıştır. Deneysel çalışmalarda, hiyerarşik kümeleme toplulukları 8 farklı veri setinde uygulanmış ve tek bir kümeleme algoritmasına göre daha iyi sonuçlar elde edilmiştir