Search CORE

576 research outputs found

OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters.

Author: Sekula Michael N.
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2015
Field of study

Determining the best clustering algorithm and ideal number of clusters for a particular dataset is a fundamental difficulty in unsupervised clustering analysis. In biological research, data generated from Next Generation Sequencing technology and microarray gene expression data are becoming more and more common, so new tools and resources are needed to group such high dimensional data using clustering analysis. Different clustering algorithms can group data very differently. Therefore, there is a need to determine the best groupings in a given dataset using the most suitable clustering algorithm for that data. This paper presents the R package optCluster as an efficient way for users to evaluate up to ten clustering algorithms, ultimately determining the optimal algorithm and optimal number of clusters for a given set of data. The selected clustering algorithms are evaluated by as many as nine validation measures classified as “biological”, “internal”, or “stability”, and the final result is obtained through a weighted rank aggregation algorithm based on the calculated validation scores. Two examples using this package are presented, one with a microarray dataset and the other with an RNA-Seq dataset. These two examples highlight the capabilities the optCluster package and demonstrate its usefulness as a tool in cluster analysis

University of Louisville

Decloud: an unsupervised deconvolution tool for building gene expression profiles

Author: Kjørsvik Øystein
Publication venue: The University of Bergen
Publication date: 28/05/2018
Field of study

Deconvolution is the process of decomposing a mixed signal into its originating elements. For my thesis I created a clustering application, named DeCloud, with the intent to replace the unsupervised clustering step in the deconvolution tool, Deblender. Utilizing clustering packages in R such as optCluster, the application was built to allow for a range of new clustering algorithms. In this thesis the scope has been set to test Hierarchical clustering and two variations of PAM. A novel filtering function was created, providing a different approach to handling clusters. The novel approach has been implemented for use with the PAM clustering method, but could be applied to other algorithms as well. We have tested the resulting pipeline on the data sets used to benchmark Deblender and other tools. Comparing the results obtained by Deblender and by DeCloud, shows that DeCloud obtains marked better results on two of the three datasets used for testing. The last dataset is a complicated case, none of the applications are able to effectively cluster and deconvolve. The novel filter function applied to the PAM algorithm has been shown to be the best performer in each of the two successful deconvolution datasets.Master's Thesis in InformaticsINF39

University of Bergen

Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data

Author: Han Danfu
Pan Haiyan
Zhu Jun
Publication venue: Beijing Institute of Genomics, the Chinese Academy of Sciences and the Genetics Society of China. Production and hosting by Elsevier B.V.
Publication date: 30/11/2003
Field of study

A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance of HGACLUS and other methods was compared by using simulated data and open microarray gene-expression datasets. HGACLUS was generally found to be more accurate and robust than other methods discussed in this paper by the exact validation strategy and the explicit cluster number

Elsevier - Publisher Connector

Evolutionary framework for DNA Microarry Cluster Analysis

Author: Castellanos Garzón José Antonio
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/2013
Field of study

En esta investigación se propone un framework evolutivo donde se fusionan un método de clustering jerárquico basado en un modelo evolutivo, un conjunto de medidas de validación de agrupamientos (clusters) de datos y una herramienta de visualización de clusterings. El objetivo es crear un marco apropiado para la extracción de conocimiento a partir de datos provenientes de DNA-microarrays. Por una parte, el modelo evolutivo de clustering de nuestro framework es una alternativa novedosa que intenta resolver algunos de los problemas presentes en los métodos de clustering existentes. Por otra parte, nuestra alternativa de visualización de clusterings, materializada en una herramienta, incorpora nuevas propiedades y nuevos componentes de visualización, lo cual permite validar y analizar los resultados de la tarea de clustering. De este modo, la integración del modelo evolutivo de clustering con el modelo visual de clustering, convierta a nuestro framework evolutivo en una aplicación novedosa de minería de datos frente a los métodos convencionales

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes

Author: A Horzyk
AA Alizadeh
AK Jain
Anirban Mukhopadhyay
AV Lukashin
C Xiang
CA Coello Coello
CW Hsu
D Dembele
DE Goldberg
DJ Lockhart
E Zitzler
I Davidson
J Handl
J Herrero
JC Bezdek
JT Tou
K Crammer
K Deb
M Hollander
MB Eisen
P Reymonda
P Rousseeuw
P Tamayo
R Sharan
RJ Cho
S Bandyopadhyay
S Bandyopadhyay
S Bandyopadhyay
S Bandyopadhyay
S Bandyopadhyay
S Chu
S Tavazoie
Sanghamitra Bandyopadhyay
SY Kim
SZ Selim
U Maulik
U Maulik
Ujjwal Maulik
V Vapnik
VR Iyer
X Wen
XL Xie
Y Xu
ZS Qin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Evaluation of statistical correlation and validation methods for construction of gene co-expression networks

Author: Duvvuru Suman
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2008
Field of study

High-throughput technologies such as microarrays have led to the rapid accumulation of large scale genomic data providing opportunities to systematically infer gene function and co-expression networks. Typical steps of co-expression network analysis using microarray data consist of estimation of pair-wise gene co-expression using some similarity measure, construction of co-expression networks, identification of clusters of co-expressed genes and post-cluster analyses such as cluster validation. This dissertation is primarily concerned with development and evaluation of approaches for the first and the last steps – estimation of gene co-expression matrices and validation of network clusters. Since clustering methods are not a focus, only a paraclique clustering algorithm will be used in this evaluation. First, a novel Bayesian approach is presented for combining the Pearson correlation with prior biological information from Gene Ontology, yielding a biologically relevant estimate of gene co-expression. The addition of biological information by the Bayesian approach reduced noise in the paraclique gene clusters as indicated by high silhouette and increased homogeneity of clusters in terms of molecular function. Standard similarity measures including correlation coefficients from Pearson, Spearman, Kendall’s Tau, Shrinkage, Partial, and Mutual information, and Euclidean and Manhattan distance measures were evaluated. Based on quality metrics such as cluster homogeneity and stability with respect to ontological categories, clusters resulting from partial correlation and mutual information were more biologically relevant than those from any other correlation measures. Second, statistical quality of clusters was evaluated using approaches based on permutation tests and Mantel correlation to identify significant and informative clusters that capture most of the covariance in the dataset. Third, the utility of statistical contrasts was studied for classification of temporal patterns of gene expression. Specifically, polynomial and Helmert contrast analyses were shown to provide a means of labeling the co-expressed gene sets because they showed similar temporal profiles

University of Tennessee, Knoxville: Trace

Multi-Objective Differential Evolution for Automatic Clustering with Application to Micro-Array Data Analysis

Author: Abbass
Ajith Abraham
Al-Shahrour
Bandyopadhyay
Bandyopadhyay
Bezdek
Chu
Coello Coello
Corne
Das
Deb
Deb
Debarati Kundu
Eisen
Handl
Hubert
Jain
Kaushik Suresh
Keim
Mattson
Paterlinia
Rand
Rousseeuw
Sang Yong Han
Sawaragi
Sayan Ghosh
Storn
Storn
Swagatam Das
Theodoridis
Tibshirani
Xie
Xu
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2009
Field of study

This paper applies the Differential Evolution (DE) algorithm to the task of automatic fuzzy clustering in a Multi-objective Optimization (MO) framework. It compares the performances of two multi-objective variants of DE over the fuzzy clustering problem, where two conflicting fuzzy validity indices are simultaneously optimized. The resultant Pareto optimal set of solutions from each algorithm consists of a number of non-dominated solutions, from which the user can choose the most promising ones according to the problem specifications. A real-coded representation of the search variables, accommodating variable number of cluster centers, is used for DE. The performances of the multi-objective DE-variants have also been contrasted to that of two most well-known schemes of MO clustering, namely the Non Dominated Sorting Genetic Algorithm (NSGA II) and Multi-Objective Clustering with an unknown number of Clusters K (MOCK). Experimental results using six artificial and four real life datasets of varying range of complexities indicate that DE holds immense promise as a candidate algorithm for devising MO clustering schemes

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

An Experimental Study on Microarray Expression Data from Plants under Salt Stress by using Clustering Methods

Author: Barigou Fatiha
Bouamrane Karim
Fyad Houda
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 29/03/2022
Field of study

Current Genome-wide advancements in Gene chips technology provide in the “Omics (genomics, proteomics and transcriptomics) research”, an opportunity to analyze the expression levels of thousand of genes across multiple experiments. In this regard, many machine learning approaches were proposed to deal with this deluge of information. Clustering methods are one of these approaches. Their process consists of grouping data (gene profiles) into homogeneous clusters using distance measurements. Various clustering techniques are applied, but there is no consensus for the best one. In this context, a comparison of seven clustering algorithms was performed and tested against the gene expression datasets of three model plants under salt stress. These techniques are evaluated by internal and relative validity measures. It appears that the AGNES algorithm is the best one for internal validity measures for the three plant datasets. Also, K-Means profiles a trend for relative validity measures for these datasets

Re-UNIR