653 research outputs found

    Systems biology approaches to the dynamics of gene expression and chemical reactions

    Get PDF
    Systems biology is an emergent interdisciplinary field of study whose main goal is to understand the global properties and functions of a biological system by investigating its structure and dynamics [74]. This high-level knowledge can be reached only with a coordinated approach involving researchers with different backgrounds in molecular biology, the various omics (like genomics, proteomics, metabolomics), computer science and dynamical systems theory. The history of systems biology as a distinct discipline began in the 1960s, and saw an impressive growth since year 2000, originated by the increased accumulation of biological information, the development of high-throughput experimental techniques, the use of powerful computer systems for calculations and database hosting, and the spread of Internet as the standard medium for information diffusion [77]. In the last few years, our research group tried to tackle a set of systems biology problems which look quite diverse, but share some topics like biological networks and system dynamics, which are of our interest and clearly fundamental for this field. In fact, the first issue we studied (covered in Part I) was the reverse engineering of large-scale gene regulatory networks. Inferring a gene network is the process of identifying interactions among genes from experimental data (tipically microarray expression profiles) using computational methods [6]. Our aim was to compare some of the most popular association network algorithms (the only ones applicable at a genome-wide level) in different conditions. In particular we verified the predictive power of similarity measures both of direct type (like correlations and mutual information) and of conditional type (partial correlations and conditional mutual information) applied on different kinds of experiments (like data taken at equilibrium or time courses) and on both synthetic and real microarray data (for E. coli and S. cerevisiae). In our simulations we saw that all network inference algorithms obtain better performances from data produced with \u201cstructural\u201d perturbations (like gene knockouts at steady state) than with just dynamical perturbations (like time course measurements or changes of the initial expression levels). Moreover, our analysis showed differences in the performances of the algorithms: direct methods are more robust in detecting stable relationships (like belonging to the same protein complex), while conditional methods are better at causal interactions (e.g. transcription factor\u2013binding site interactions), especially in presence of combinatorial transcriptional regulation. Even if time course microarray experiments are not particularly useful for inferring gene networks, they can instead give a great amount of information about the dynamical evolution of a biological process, provided that the measurements have a good time resolution. Recently, such a dataset has been published [119] for the yeast metabolic cycle, a well-known process where yeast cells synchronize with respect to oxidative and reductive functions. In that paper, the long-period respiratory oscillations were shown to be reflected in genome-wide periodic patterns in gene expression. As explained in Part II, we analyzed these time series in order to elucidate the dynamical role of post-transcriptional regulation (in particular mRNA stability) in the coordination of the cycle. We found that for periodic genes, arranged in classes according either to expression profile or to function, the pulses of mRNA abundance have phase and width which are directly proportional to the corresponding turnover rates. Moreover, the cascade of events which occurs during the yeast metabolic cycle (and their correlation with mRNA turnover) reflects to a large extent the gene expression program observable in other dynamical contexts such as the response to stresses or stimuli. The concepts of network and of systems dynamics return also as major arguments of Part III. In fact, there we present a study of some dynamical properties of the so-called chemical reaction networks, which are sets of chemical species among which a certain number of reactions can occur. These networks can be modeled as systems of ordinary differential equations for the species concentrations, and the dynamical evolution of these systems has been theoretically studied since the 1970s [47, 65]. Over time, several independent conditions have been proved concerning the capacity of a reaction network, regardless of the (often poorly known) reaction parameters, to exhibit multiple equilibria. This is a particularly interesting characteristic for biological systems, since it is required for the switch-like behavior observed during processes like intracellular signaling and cell differentiation. Inspired by those works, we developed a new open source software package for MATLAB, called ERNEST, which, by checking these various criteria on the structure of a chemical reaction network, can exclude the multistationarity of the corresponding reaction system. The results of this analysis can be used, for example, for model discrimination: if for a multistable biological process there are multiple candidate reaction models, it is possible to eliminate some of them by proving that they are always monostationary. Finally, we considered the related property of monotonicity for a reaction network. Monotone dynamical systems have the tendency to converge to an equilibrium and do not present chaotic behaviors. Most biological systems have the same features, and are therefore considered to be monotone or near-monotone [85, 116]. Using the notion of fundamental cycles from graph theory, we proved some theoretical results in order to determine how distant is a given biological network from being monotone. In particular, we showed that the distance to monotonicity of a network is equal to the minimal number of negative fundamental cycles of the corresponding J-graph, a signed multigraph which can be univocally associated to a dynamical system

    Improved prediction of protein interaction from microarray data using asymmetric correlation

    Get PDF
    Background:Detection of correlated gene expression is a fundamental process in the characterization of gene functions using microarray data. Commonly used methods such as the Pearson correlation can detect only a fraction of interactions between genes or their products. However, the performance of correlation analysis can be significantly improved either by providing additional biological information or by combining correlation with other techniques that can extract various mathematical or statistical properties of gene expression from microarray data. In this article, I will test the performance of three correlation methods-the Pearson correlation, the rank (Spearman) correlation, and the Mutual Information approach-in detection of protein-protein interactions, and I will further examine the properties of these techniques when they are used together. I will also develop a new correlation measure which can be used with other measures to improve predictive power.
 
Results:Using data from 5,896 microarray hybridizations, the three measures were obtained for 30,499 known protein-interacting pairs in the Human Protein Reference Database (HPRD). Pearson correlation showed the best sensitivity (0.305) but the three measures showed similar specificity (0.240 - 0.257). When the three measures were compared, it was found that better specificity could be obtained at a high Pearson coefficient combined with a low Spearman coefficient or Mutual Information. Using a toy model of two gene interactions, I found that such measure combinations were most likely to exist at stronger curvature. I therefore introduced a new measure, termed asymmetric correlation (AC), which directly quantifies the degree of curvature in the expression levels of two genes as a degree of asymmetry. I found that AC performed better than the other measures, particularly when high specificity was required. Moreover, a combination of AC with other measures significantly improved specificity and sensitivity, by up to 50%. 
 
Conclusions: A combination of correlation measures, particularly AC and Pearson correlation, can improve prediction of protein-protein interactions. Further studies are required to assess the biological significance of asymmetry in expression patterns of gene pairs. 
&#xa

    Inferring the Transcriptional Landscape of Bovine Skeletal Muscle by Integrating Co-Expression Networks

    Get PDF
    Background: Despite modern technologies and novel computational approaches, decoding causal transcriptional regulation remains challenging. This is particularly true for less well studied organisms and when only gene expression data is available. In muscle a small number of well characterised transcription factors are proposed to regulate development. Therefore, muscle appears to be a tractable system for proposing new computational approaches. Methodology/Principal Findings: Here we report a simple algorithm that asks "which transcriptional regulator has the highest average absolute co-expression correlation to the genes in a co-expression module?" It correctly infers a number of known causal regulators of fundamental biological processes, including cell cycle activity (E2F1), glycolysis (HLF), mitochondrial transcription (TFB2M), adipogenesis (PIAS1), neuronal development (TLX3), immune function (IRF1) and vasculogenesis (SOX17), within a skeletal muscle context. However, none of the canonical pro-myogenic transcription factors (MYOD1, MYOG, MYF5, MYF6 and MEF2C) were linked to muscle structural gene expression modules. Co-expression values were computed using developing bovine muscle from 60 days post conception (early foetal) to 30 months post natal (adulthood) for two breeds of cattle, in addition to a nutritional comparison with a third breed. A number of transcriptional landscapes were constructed and integrated into an always correlated landscape. One notable feature was a 'metabolic axis' formed from glycolysis genes at one end, nuclear-encoded mitochondrial protein genes at the other, and centrally tethered by mitochondrially-encoded mitochondrial protein genes. Conclusions/Significance: The new module-to-regulator algorithm complements our recently described Regulatory Impact Factor analysis. Together with a simple examination of a co-expression module's contents, these three gene expression approaches are starting to illuminate the in vivo transcriptional regulation of skeletal muscle development

    CORUM: the comprehensive resource of mammalian protein complexes—2009

    Get PDF
    CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing ∼16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a ‘Phylogenetic Conservation’ analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html)

    Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks

    Get PDF
    We have compared a recently developed module-based algorithm LeMoNe for reverse-engineering transcriptional regulatory networks to a mutual information based direct algorithm CLR, using benchmark expression data and databases of known transcriptional regulatory interactions for Escherichia coli and Saccharomyces cerevisiae. A global comparison using recall versus precision curves hides the topologically distinct nature of the inferred networks and is not informative about the specific subtasks for which each method is most suited. Analysis of the degree distributions and a regulator specific comparison show that CLR is 'regulator-centric', making true predictions for a higher number of regulators, while LeMoNe is 'target-centric', recovering a higher number of known targets for fewer regulators, with limited overlap in the predicted interactions between both methods. Detailed biological examples in E. coli and S. cerevisiae are used to illustrate these differences and to prove that each method is able to infer parts of the network where the other fails. Biological validation of the inferred networks cautions against over-interpreting recall and precision values computed using incomplete reference networks.Comment: 13 pages, 1 table, 6 figures + 6 pages supplementary information (1 table, 5 figures

    The Era of Commercialized Genetics: Examining the Intersection of DNA, Identity, and Personal Origin

    Get PDF

    Researchers' Assumptions and Mathematical Models: A Philosophical Study of Metabolic Systems Biology

    Get PDF
    This thesis examines the philosophical implications of the assumptions made by researchers involved in the development of mathematical models of metabolism. It does this through an analysis of several detailed historical case studies of models between the 1960’s and the present day, thus also contributing to the growing literature on the historiography of biochemical systems biology. The chapters focus on four main topics: the relationship between models and theory, temporal decomposition as a simplifying strategy for building models of complex metabolic systems, interactions between modellers and experimental biochemists, and the role of biochemical data. Four categories of assumptions are shown to play a significant role in these different aspects of model development; ontological assumptions, idealising assumptions, assumptions about data, and researchers’ commitments. Building on this analysis, the thesis brings to light the importance of researcher’s ontological and idealising assumptions about the temporal organisation, alongside the spatial organisation, of metabolic systems. It also offers an account of different forms of interactions between research groups – hostile interactions, closed collaboration, and open collaboration – on the basis of differences in the characteristics of researcher’s commitments. Throughout the case studies, biological data play a powerful role in model development by virtue of the contents of available data sets, as well as researchers’ perceptions of those data, which are in turn influenced by their ontological assumptions. The historical trajectories explored illustrate how the relationships between different facets of model building, and their associated philosophical abstractions, are often best understood as transient features within a highly dynamic research process, whose role depends on the specific stage of modelling in which they are enacted. This thesis provides an expanded perspective on the different types and roles of assumptions in the development of mathematical models of metabolism, which is firmly grounded in a historical analysis of scientific practice.AHR
    corecore