Search CORE

48,698 research outputs found

Noise resistant generalized parametric validity index of clustering for gene expression data

Author: Fa R
Nandi AK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2014
Field of study

This article has been made available through the Brunel Open Access Publishing Fund.Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements

Crossref

Brunel University Research Archive

Partial mixture model for tight clustering of gene expression time-course

Author: Li Chang-Tsun
Wilson Roland
Yuan Yinyin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

Deakin Research Online

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Identifying Geographic Clusters: A Network Analytic Approach

Author: Catini Roberto
Karamshuk Dmytro
Penner Orion
Riccaboni Massimo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

In recent years there has been a growing interest in the role of networks and clusters in the global economy. Despite being a popular research topic in economics, sociology and urban studies, geographical clustering of human activity has often studied been by means of predetermined geographical units such as administrative divisions and metropolitan areas. This approach is intrinsically time invariant and it does not allow one to differentiate between different activities. Our goal in this paper is to present a new methodology for identifying clusters, that can be applied to different empirical settings. We use a graph approach based on k-shell decomposition to analyze world biomedical research clusters based on PubMed scientific publications. We identify research institutions and locate their activities in geographical clusters. Leading areas of scientific production and their top performing research institutions are consistently identified at different geographic scales

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Munich RePEc Personal Archive

Crossref

Archivio della ricerca della Scuola IMT Alti Studi Lucca

King's Research Portal

IMT Institutional Repository

K-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases

Author: Alvarez-Hamelin José Ignacio
Barrat Alain
Dall'Asta Luca
Vespignani Alessandro
Publication venue
Publication date: 01/01/2008
Field of study

We consider the

k

-core decomposition of network models and Internet graphs at the autonomous system (AS) level. The

k

-core analysis allows to characterize networks beyond the degree distribution and uncover structural properties and hierarchies due to the specific architecture of the system. We compare the

k

-core structure obtained for AS graphs with those of several network models and discuss the differences and similarities with the real Internet architecture. The presence of biases and the incompleteness of the real maps are discussed and their effect on the

k

-core analysis is assessed with numerical experiments simulating biased exploration on a wide range of network models. We find that the

k

-core analysis provides an interesting characterization of the fluctuations and incompleteness of maps as well as information helping to discriminate the original underlying structure

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

HAL Descartes

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Hal-Diderot

PORTO Publications Open Repository TOrino

clValid: An R Package for Cluster Validation

Author: Guy Brock
Somnath Datta
Susmita Datta
Vasyl Pihur
Publication venue
Publication date
Field of study

The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.

Research Papers in Economics