12 research outputs found

    Analysis of heat kernel highlights the strongly modular and heat-preserving structure of proteins

    Full text link
    In this paper, we study the structure and dynamical properties of protein contact networks with respect to other biological networks, together with simulated archetypal models acting as probes. We consider both classical topological descriptors, such as the modularity and statistics of the shortest paths, and different interpretations in terms of diffusion provided by the discrete heat kernel, which is elaborated from the normalized graph Laplacians. A principal component analysis shows high discrimination among the network types, either by considering the topological and heat kernel based vector characterizations. Furthermore, a canonical correlation analysis demonstrates the strong agreement among those two characterizations, providing thus an important justification in terms of interpretability for the heat kernel. Finally, and most importantly, the focused analysis of the heat kernel provides a way to yield insights on the fact that proteins have to satisfy specific structural design constraints that the other considered networks do not need to obey. Notably, the heat trace decay of an ensemble of varying-size proteins denotes subdiffusion, a peculiar property of proteins

    Gene Regulatory Network Reconstruction Using Bayesian Networks, the Dantzig Selector, the Lasso and Their Meta-Analysis

    Get PDF
    Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth “Dialogue for Reverse Engineering Assessments and Methods” (DREAM5) challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on “Systems Genetics” proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics

    Reconstruction of large-scale regulatory networks based on perturbation graphs and transitive reduction: improved methods and their evaluation

    Full text link
    BACKGROUND: The data-driven inference of intracellular networks is one of the key challenges of computational and systems biology. As suggested by recent works, a simple yet effective approach for reconstructing regulatory networks comprises the following two steps. First, the observed effects induced by directed perturbations are collected in a signed and directed perturbation graph (PG). In a second step, Transitive Reduction (TR) is used to identify and eliminate those edges in the PG that can be explained by paths and are therefore likely to reflect indirect effects. RESULTS: In this work we introduce novel variants for PG generation and TR, leading to significantly improved performances. The key modifications concern: (i) use of novel statistical criteria for deriving a high-quality PG from experimental data; (ii) the application of local TR which allows only short paths to explain (and remove) a given edge; and (iii) a novel strategy to rank the edges with respect to their confidence. To compare the new methods with existing ones we not only apply them to a recent DREAM network inference challenge but also to a novel and unprecedented synthetic compendium consisting of 30 5000-gene networks simulated with varying biological and measurement error variances resulting in a total of 270 datasets. The benchmarks clearly demonstrate the superior reconstruction performance of the novel PG and TR variants compared to existing approaches. Moreover, the benchmark enabled us to draw some general conclusions. For example, it turns out that local TR restricted to paths with a length of only two is often sufficient or even favorable. We also demonstrate that considering edge weights is highly beneficial for TR whereas consideration of edge signs is of minor importance. We explain these observations from a graph-theoretical perspective and discuss the consequences with respect to a greatly reduced computational demand to conduct TR. Finally, as a realistic application scenario, we use our framework for inferring gene interactions in yeast based on a library of gene expression data measured in mutants with single knockouts of transcription factors. The reconstructed network shows a significant enrichment of known interactions, especially within the 100 most confident (and for experimental validation most relevant) edges. CONCLUSIONS: This paper presents several major achievements. The novel methods introduced herein can be seen as state of the art for inference techniques relying on perturbation graphs and transitive reduction. Another key result of the study is the generation of a new and unprecedented large-scale in silico benchmark dataset accounting for different noise levels and providing a solid basis for unbiased testing of network inference methodologies. Finally, applying our approach to Saccharomyces cerevisiae suggested several new gene interactions with high confidence awaiting experimental validation

    Simulation and identification of gene regulatory networks

    Get PDF
    Gene regulatory networks are a well-established model to represent the functioning, at gene level, of utterly elaborated biological networks. Studying and understanding such models of gene communication might enable researchers to rightly address costly laboratory experiments, e.g. by selecting a small set of genes deemed to be responsible for a particular disease, or by indicating with confidence which molecule is supposed to be susceptible to certain drug treatments. This thesis explores two main aspects regarding gene regulatory networks: (i) the simulation of realistic perturbative and systems genetics experiments in gene networks, and (ii) the inference of gene networks from simulated and real data measurements. In detail, the following themes will be discussed: (i) SysGenSIM, an open source software to produce gene networks with realistic topology and simulate systems genetics or targeted perturbative experiments; (ii) two state of the arts algorithms for the structural identification of gene networks from single-gene knockout measurements; (iii) an approach to reverse-engineering gene networks from heterogeneous compendia; (iv) a methodology to infer gene interactions fromsystems genetics dataset. These works have been positively recognized by the scientific community. In particular, SysGenSIM has been used – in addition to providing valuable test benches for the development of the above inference algorithms – to generate benchmark datasets for international competitions as the DREAM5 Systems Genetics challenge and the StatSeq workshop. The identificationmethodologies earned their worth by accurately reverse-engineering gene networks at established contests, namely the DREAM Network Inference challenges. Results are explained and discussed thoroughly in the thesis

    Simulation and identification of gene regulatory networks

    Get PDF
    Gene regulatory networks are a well-established model to represent the functioning, at gene level, of utterly elaborated biological networks. Studying and understanding such models of gene communication might enable researchers to rightly address costly laboratory experiments, e.g. by selecting a small set of genes deemed to be responsible for a particular disease, or by indicating with confidence which molecule is supposed to be susceptible to certain drug treatments. This thesis explores two main aspects regarding gene regulatory networks: (i) the simulation of realistic perturbative and systems genetics experiments in gene networks, and (ii) the inference of gene networks from simulated and real data measurements. In detail, the following themes will be discussed: (i) SysGenSIM, an open source software to produce gene networks with realistic topology and simulate systems genetics or targeted perturbative experiments; (ii) two state of the arts algorithms for the structural identification of gene networks from single-gene knockout measurements; (iii) an approach to reverse-engineering gene networks from heterogeneous compendia; (iv) a methodology to infer gene interactions fromsystems genetics dataset. These works have been positively recognized by the scientific community. In particular, SysGenSIM has been used – in addition to providing valuable test benches for the development of the above inference algorithms – to generate benchmark datasets for international competitions as the DREAM5 Systems Genetics challenge and the StatSeq workshop. The identificationmethodologies earned their worth by accurately reverse-engineering gene networks at established contests, namely the DREAM Network Inference challenges. Results are explained and discussed thoroughly in the thesis

    AN INTEGRATIVE SYSTEMS BIOINFORMATICS APPROACH OF THE ENVIRONMENTAL, GENETIC AND MOLECULAR FACTORS REGULATING SLEEP

    Get PDF
    Environmental changes and genetic variations are two important drivers of biological diversity. In complex traits, a multitude of genetic and environmental factors interact and combine in cryptic ways to direct the phenotypic variation. Sleep is a classic illustration of a complex trait that is vital and heritable but still poorly understood. Many aspects of sleep like the timing, duration and quality are regulated by the interaction of two processes: the circadian oscillations and the sleep homeostasis. In the context of a study that aimed at uncovering more clearly the molecular pathways regulating the sleep homeostat through the ambiguous relationship that exists between sleep- wake cycle and metabolism, we built, assembled, analyzed an extensive multi-scaled dataset using the systems genetics design. Machine learning algorithms and novel high-throughput sequencing technology permit to appraise more precisely and broadly the plethora of physiological and molecular phenotypes that contribute to sleep under disparate circumstances and genetic background, in order to build novel hypotheses based on data-driven discoveries. This dataset is composed of 33 recombinant inbred lines (RIL) from the BXD panel that were interrogated under sleep deprivation and undisturbed conditions for 341 sleep-wake related physiological phenotypes, 124 blood plasma metabolites, and cortical and liver transcriptomics. First analyses pointed out the pervasive effects of sleep deprivation and genetics both at the molecular and behavioral level and the complex interaction between genetic and environmental factors at all phenotypic layers. Then, two novel integrative methods were developed, the first to prioritize candidate genes within large associated genomic regions for physiological or metabolic phenotypes and the second to visualize the meta-dimensionality of the molecular network using the deterministic structure of hiveplots. Our findings led to the discovery of a bidirectional relationship between fatty acid turnover and sleep homeostasis but also between brain slow-waves activity and ionotropic glutamate receptor transport. Using markup language and cloud-based technologies, we aimed at transforming this resourceful, multidisciplinary dataset into an exploitable digital research object. The generation of dynamic analysis reports and workflow metadata promoted the reproducibility this data-object. In addition, tools were developed for the exploration and mining of integrated data. The resulting database and associated web interface ensures the reusability of this dataset and associated methodologies. -- La diversitĂ© biologique est dirigĂ©e par deux opĂ©rateurs importants, les changements environnementaux ainsi que les variations gĂ©nĂ©tiques. Pour les traits dits complexe, leur variation est le fruit de nombreux facteurs gĂ©nĂ©tiques et environnementaux qui vont interagir et se combiner, souvent de maniĂšre cryptique. Le sommeil est un exemple-type de trait complexe, il est vital et hĂ©ritable mais fondamentalement mĂ©connu. La rĂ©gulation de nombreux aspects du sommeil comme sa durĂ©e, timing ou qualitĂ© fait intervenir deux processus : les oscillations circadiennes et l’homĂ©ostasie du sommeil. Afin de mieux cerner les voies qui rĂ©gulent le mĂ©canisme d’homĂ©ostasie du sommeil, en particulier celle mĂȘlant le mĂ©tabolisme, nous avons crĂ©Ă©, assemblĂ© et analysĂ© un grand set de donnĂ©es en utilisant une approche dite de gĂ©nĂ©tique des systĂšmes. Avec l’aide d’algorithmes d’apprentissage automatique et de nouvelles technologies de sĂ©quençage Ă  haut-dĂ©bit, nous avons pu mesurer dans des conditions et contextes gĂ©nĂ©tiques diffĂ©rents de nombreux phĂ©notypes molĂ©culaires ou physiologiques qui contribuent Ă  la rĂ©gulation du sommeil. Notre approche Ă©tant ainsi principalement axĂ©e sur la construction d’hypothĂšse guidĂ©e par les donnĂ©es. Ce set est composĂ© de 33 lignĂ©es de souris consanguines recombinantes (BXD) dont on a examinĂ©, dans des conditions de privation de sommeil et de contrĂŽle : 341 phĂ©notypes physiologiques liĂ©s au sommeil et Ă  l’éveil, 124 mĂ©tabolites du plasma sanguin, ainsi que leur transcriptome du cortex et du foie. Les premiĂšres analyses ont pointĂ© l’effet aigu de la privation de sommeil, de la gĂ©nĂ©tique ainsi que leur interaction sur tous les niveaux de phĂ©notypes. Ensuite, deux nouvelles mĂ©thodes d’intĂ©gration ont Ă©tĂ© dĂ©veloppĂ©es, la premiĂšre pour prioritiser les gĂšnes opĂ©rateurs du sommeil et du mĂ©tabolisme Ă  l’intĂ©rieur de grande rĂ©gion gĂ©nomique, la deuxiĂšme pour visualiser la mĂ©ta-dimensionalitĂ© des donnĂ©es molĂ©culaires via une structure de ‘hiveplot’. Nous avons mis en avant une relation bidirectionnelle entre les modifications d’acides gras et l’homĂ©ostasie du sommeil, ainsi que l’activitĂ© des ondes lentes du cerveau et le transport de rĂ©cepteur au glutamate ionotropique. En utilisant le langage de balisage ainsi que des technologies basĂ©es sur le cloud, nous avons cherchĂ© Ă  transformer ce jeu de donnĂ©es en un objet de recherche numĂ©rique. La reproductibilitĂ© de cet objet a Ă©tĂ© amĂ©liorĂ©e par la gĂ©nĂ©ration de rapports d'analyse dynamiques ainsi que de mĂ©tadonnĂ©es. De plus, des outils ont Ă©tĂ© dĂ©veloppĂ©s pour l'exploration et l'extraction de donnĂ©es via une interface web et assurent ainsi la rĂ©utilisation de ce set et de ces mĂ©thodologies associĂ©es

    Statistical model identification : dynamical processes and large-scale networks in systems biology

    Get PDF
    Magdeburg, Univ., Fak. fĂŒr Verfahrens- und Systemtechnik, Diss., 2014von Robert Johann Flassi
    corecore