Search CORE

76,255 research outputs found

Neuroengineering of Clustering Algorithms

Author: Brito da Silva Leonardo Enzo
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2019
Field of study

Cluster analysis can be broadly divided into multivariate data visualization, clustering algorithms, and cluster validation. This dissertation contributes neural network-based techniques to perform all three unsupervised learning tasks. Particularly, the first paper provides a comprehensive review on adaptive resonance theory (ART) models for engineering applications and provides context for the four subsequent papers. These papers are devoted to enhancements of ART-based clustering algorithms from (a) a practical perspective by exploiting the visual assessment of cluster tendency (VAT) sorting algorithm as a preprocessor for ART offline training, thus mitigating ordering effects; and (b) an engineering perspective by designing a family of multi-criteria ART models: dual vigilance fuzzy ART and distributed dual vigilance fuzzy ART (both of which are capable of detecting complex cluster structures), merge ART (aggregates partitions and lessens ordering effects in online learning), and cluster validity index vigilance in fuzzy ART (features a robust vigilance parameter selection and alleviates ordering effects in offline learning). The sixth paper consists of enhancements to data visualization using self-organizing maps (SOMs) by depicting in the reduced dimension and topology-preserving SOM grid information-theoretic similarity measures between neighboring neurons. This visualization\u27s parameters are estimated using samples selected via a single-linkage procedure, thereby generating heatmaps that portray more homogeneous within-cluster similarities and crisper between-cluster boundaries. The seventh paper presents incremental cluster validity indices (iCVIs) realized by (a) incorporating existing formulations of online computations for clusters\u27 descriptors, or (b) modifying an existing ART-based model and incrementally updating local density counts between prototypes. Moreover, this last paper provides the first comprehensive comparison of iCVIs in the computational intelligence literature --Abstract, page iv

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Author: Akhanli Serhat Emre
Hennig Christian
Publication venue
Publication date: 23/06/2020
Field of study

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.Comment: 42 pages, 11 figure

arXiv.org e-Print Archive

UCL Discovery

SMART: Unique splitting-while-merging framework for gene clustering

Author: A Thalamuthu
AD Lanterman
AE Teschendorff
AK Jain
Asoke K. Nandi
B Abu-Jamous
B Fritzke
B Fritzke
CR Lin
CS Wallace
D Dembele
D Jiang
David J. Roberts
G Celeux
H Akaike
J Qin
J Rissanen
KY Yeung
L Hubert
L Mavridis
L Zhao
MAT Figueiredo
P Tamayo
PT Spellman
R Xu
R Xu
RJ Cho
Rui Fa
S Bandyopadhyay
S Monti
S Wu
Sergio Gómez
T Kohonen
T Pramila
TR Golub
WM Rand
YJ Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/04/2014
Field of study

Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Brunel University Research Archive

Recommended from our members

Integrative analysis of the inter-tumoral heterogeneity of triple-negative breast cancer.

Author: Boymoushakian Lari
Chiu Alec M
Coller Hilary A
Mitra Mithun
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Triple-negative breast cancers (TNBC) lack estrogen and progesterone receptors and HER2 amplification, and are resistant to therapies that target these receptors. Tumors from TNBC patients are heterogeneous based on genetic variations, tumor histology, and clinical outcomes. We used high throughput genomic data for TNBC patients (n = 137) from TCGA to characterize inter-tumor heterogeneity. Similarity network fusion (SNF)-based integrative clustering combining gene expression, miRNA expression, and copy number variation, revealed three distinct patient clusters. Integrating multiple types of data resulted in more distinct clusters than analyses with a single datatype. Whereas most TNBCs are classified by PAM50 as basal subtype, one of the clusters was enriched in the non-basal PAM50 subtypes, exhibited more aggressive clinical features and had a distinctive signature of oncogenic mutations, miRNAs and expressed genes. Our analyses provide a new classification scheme for TNBC based on multiple omics datasets and provide insight into molecular features that underlie TNBC heterogeneity

eScholarship - University of California

Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

Author: A Strehl
A Weingessel
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
B Fischer
Basel Abu-Jamous
D Greene
D Liu
D Stuart
David J. Roberts
E Dimitriadou
E Dimitriadou
FD Gibbons
HG Ayad
JM Pena
K Tumer
KY Yeung
LP Zhao
MBH Rhouma
N Slonim
O Nwamadi
PT Spellman
R Avogadri
R BabusÏka
R Baumgartner
R Fa
R Nilsson
RJ Cho
Rui Fa
S Dudoit
S Haykin
S Vega-Pons
S Vega-Pons
SA Salem
Shyamal D. Peddada
T Pramila
TE Kohonen
X Zhou
Z Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

Public Library of Science (PLOS)

CiteSeerX

Crossref

Jyväskylä University Digital Archive

Directory of Open Access Journals

UCL Discovery

PubMed Central

Brunel University Research Archive

FigShare