19 research outputs found

    Comparison and validation of community structures in complex networks

    Full text link
    The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information. Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process.Comment: To appear in Physica A; 25 page

    Next station in microarray data analysis: GEPAS

    Get PDF
    The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at

    CarGene: Characterisation of sets of genes based on metabolic pathways analysis

    Get PDF
    The great amount of biological information provides scientists with an incomparable framework for testing the results of new algorithms. Several tools have been developed for analysing gene-enrichment and most of them are Gene Ontology-based tools. We developed a Kyoto Encyclopedia of Genes and Genomes (Kegg)-based tool that provides a friendly graphical environment for analysing gene-enrichment. The tool integrates two statistical corrections and simultaneously analysing the information about many groups of genes in both visual and textual manner. We tested the usefulness of our approach on a previous analysis (Huttenshower et al.). Furthermore, our tool is freely available (http://www.upo.es/eps/bigs/cargene.html).Ministerio de Ciencia y Tecnología TIN2007-68084-C02-00Ministerio de Ciencia e Innovación PCI2006-A7-0575Junta de Andalucía P07-TIC-02611Junta de Andalucía TIC-20

    Clusterv : a tool for assessing the reliability of clusters discovered in DNA microarray data

    Get PDF
    We present a new R package for the assessment of the reliability of clusters discovered in high dimensional DNA microarray data. The package implements methods based on random projections that approximately preserve distances between examples in the projected subspaces

    Transcriptional regulatory network discovery via multiple method integration: application to e. coli K12

    Get PDF
    Transcriptional regulatory network (TRN) discovery from one method (e.g. microarray analysis, gene ontology, phylogenic similarity) does not seem feasible due to lack of sufficient information, resulting in the construction of spurious or incomplete TRNs. We develop a methodology, TRND, that integrates a preliminary TRN, microarray data, gene ontology and phylogenic similarity to accurately discover TRNs and apply the method to E. coli K12. The approach can easily be extended to include other methodologies. Although gene ontology and phylogenic similarity have been used in the context of gene-gene networks, we show that more information can be extracted when gene-gene scores are transformed to gene-transcription factor (TF) scores using a preliminary TRN. This seems to be preferable over the construction of gene-gene interaction networks in light of the observed fact that gene expression and activity of a TF made of a component encoded by that gene is often out of phase. TRND multi-method integration is found to be facilitated by the use of a Bayesian framework for each method derived from its individual scoring measure and a training set of gene/TF regulatory interactions. The TRNs we construct are in better agreement with microarray data. The number of gene/TF interactions we discover is actually double that of existing networks

    Construction of gene regulatory networks using biclustering and bayesian networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding gene interactions in complex living systems can be seen as the ultimate goal of the systems biology revolution. Hence, to elucidate disease ontology fully and to reduce the cost of drug development, gene regulatory networks (GRNs) have to be constructed. During the last decade, many GRN inference algorithms based on genome-wide data have been developed to unravel the complexity of gene regulation. Time series transcriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to the large number of genes. Dimensionality is one of the interesting problems in GRN modelling.</p> <p>Results</p> <p>In this paper, we develop a biclustering function enrichment analysis toolbox (BicAT-plus) to study the effect of biclustering in reducing data dimensions. The network generated from our system was validated via available interaction databases and was compared with previous methods. The results revealed the performance of our proposed method.</p> <p>Conclusions</p> <p>Because of the sparse nature of GRNs, the results of biclustering techniques differ significantly from those of previous methods.</p

    Transcriptional regulatory network refinement and quantification through kinetic modeling, gene expression microarray data and information theory

    Get PDF
    BACKGROUND: Gene expression microarray and other multiplex data hold promise for addressing the challenges of cellular complexity, refined diagnoses and the discovery of well-targeted treatments. A new approach to the construction and quantification of transcriptional regulatory networks (TRNs) is presented that integrates gene expression microarray data and cell modeling through information theory. Given a partial TRN and time series data, a probability density is constructed that is a functional of the time course of transcription factor (TF) thermodynamic activities at the site of gene control, and is a function of mRNA degradation and transcription rate coefficients, and equilibrium constants for TF/gene binding. RESULTS: Our approach yields more physicochemical information that compliments the results of network structure delineation methods, and thereby can serve as an element of a comprehensive TRN discovery/quantification system. The most probable TF time courses and values of the aforementioned parameters are obtained by maximizing the probability obtained through entropy maximization. Observed time delays between mRNA expression and activity are accounted for implicitly since the time course of the activity of a TF is coupled by probability functional maximization, and is not assumed to be proportional to expression level of the mRNA type that translates into the TF. This allows one to investigate post-translational and TF activation mechanisms of gene regulation. Accuracy and robustness of the method are evaluated. A kinetic formulation is used to facilitate the analysis of phenomena with a strongly dynamical character while a physically-motivated regularization of the TF time course is found to overcome difficulties due to omnipresent noise and data sparsity that plague other methods of gene expression data analysis. An application to Escherichia coli is presented. CONCLUSION: Multiplex time series data can be used for the construction of the network of cellular processes and the calibration of the associated physicochemical parameters. We have demonstrated these concepts in the context of gene regulation understood through the analysis of gene expression microarray time series data. Casting the approach in a probabilistic framework has allowed us to address the uncertainties in gene expression microarray data. Our approach was found to be robust to error in the gene expression microarray data and mistakes in a proposed TRN
    corecore