10,889 research outputs found
TREEOME: A framework for epigenetic and transcriptomic data integration to explore regulatory interactions controlling transcription
Motivation: Predictive modelling of gene expression is a powerful framework
for the in silico exploration of transcriptional regulatory interactions
through the integration of high-throughput -omics data. A major limitation of
previous approaches is their inability to handle conditional and synergistic
interactions that emerge when collectively analysing genes subject to different
regulatory mechanisms. This limitation reduces overall predictive power and
thus the reliability of downstream biological inference.
Results: We introduce an analytical modelling framework (TREEOME: tree of
models of expression) that integrates epigenetic and transcriptomic data by
separating genes into putative regulatory classes. Current predictive modelling
approaches have found both DNA methylation and histone modification epigenetic
data to provide little or no improvement in accuracy of prediction of
transcript abundance despite, for example, distinct anti-correlation between
mRNA levels and promoter-localised DNA methylation. To improve on this, in
TREEOME we evaluate four possible methods of formulating gene-level DNA
methylation metrics, which provide a foundation for identifying gene-level
methylation events and subsequent differential analysis, whereas most previous
techniques operate at the level of individual CpG dinucleotides. We demonstrate
TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone
modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript
abundance (RNA-seq) for H1-hESC and GM12878 cell lines.
Availability: TREEOME is implemented using open-source software and made
available as a pre-configured bootable reference environment. All scripts and
data presented in this study are available online at
http://sourceforge.net/projects/budden2015treeome/.Comment: 14 pages, 6 figure
Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli.
A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery
Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana
Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation
Modelling signaling networks underlying plant defence
Transcriptional reprogramming plays a significant role in governing plant responses to pathogens. The underlying regulatory networks are complex and dynamic, responding to numerous input signals. Most network modelling studies to date have used large-scale expression data sets from public repositories but defence network models with predictive ability have also been inferred from single time series data sets, and sophisticated biological insights generated from focused experiments containing multiple network perturbations. Using multiple network inference methods, or combining network inference with additional data, such as promoter motifs, can enhance the ability of the model to predict gene function or regulatory relationships. Network topology can highlight key signaling components and provides a systems level understanding of plant defence
Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks
Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets.
Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses.
Availability: The methods outlined in this article have been implemented in Matlab and are available on request
Systems biology in animal sciences
Systems biology is a rapidly expanding field of research and is applied in a number of biological disciplines. In animal sciences, omics approaches are increasingly used, yielding vast amounts of data, but systems biology approaches to extract understanding from these data of biological processes and animal traits are not yet frequently used. This paper aims to explain what systems biology is and which areas of animal sciences could benefit from systems biology approaches. Systems biology aims to understand whole biological systems working as a unit, rather than investigating their individual components. Therefore, systems biology can be considered a holistic approach, as opposed to reductionism. The recently developed βomicsβ technologies enable biological sciences to characterize the molecular components of life with ever increasing speed, yielding vast amounts of data. However, biological functions do not follow from the simple addition of the properties of system components, but rather arise from the dynamic interactions of these components. Systems biology combines statistics, bioinformatics and mathematical modeling to integrate and analyze large amounts of data in order to extract a better understanding of the biology from these huge data sets and to predict the behavior of biological systems. A βsystemβ approach and mathematical modeling in biological sciences are not new in itself, as they were used in biochemistry, physiology and genetics long before the name systems biology was coined. However, the present combination of mass biological data and of computational and modeling tools is unprecedented and truly represents a major paradigm shift in biology. Significant advances have been made using systems biology approaches, especially in the field of bacterial and eukaryotic cells and in human medicine. Similarly, progress is being made with βsystem approachesβ in animal sciences, providing exciting opportunities to predict and modulate animal traits
Model-guided design of ligand-regulated RNAi for programmable control of gene expression
Progress in constructing biological networks will rely on the development of more advanced components that can be predictably modified to yield optimal system performance. We have engineered an RNA-based platform, which we call an shRNA switch, that provides for integrated ligand control of RNA interference (RNAi) by modular coupling of an aptamer, competing strand, and small hairpin (sh) RNA stem into a single component that links ligand concentration and target gene expression levels. A combined experimental and mathematical modelling approach identified multiple tuning strategies and moves towards a predictable framework for the forward design of shRNA switches. The utility of our platform is highlighted by the demonstration of fine-tuning, multi-input control, and model-guided design of shRNA switches with an optimized dynamic range. Thus, shRNA switches can serve as an advanced component for the construction of complex biological systems and offer a controlled means of activating RNAi in disease therapeutics
Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-depleted Murine Embryonic Stem Cells
Embryonic stem cells (ESC) have the potential to self-renew indefinitely and
to differentiate into any of the three germ layers. The molecular mechanisms
for self-renewal, maintenance of pluripotency and lineage specification are
poorly understood, but recent results point to a key role for epigenetic
mechanisms. In this study, we focus on quantifying the impact of histone 3
acetylation (H3K9,14ac) on gene expression in murine embryonic stem cells. We
analyze genome-wide histone acetylation patterns and gene expression profiles
measured over the first five days of cell differentiation triggered by
silencing Nanog, a key transcription factor in ESC regulation. We explore the
temporal and spatial dynamics of histone acetylation data and its correlation
with gene expression using supervised and unsupervised statistical models. On a
genome-wide scale, changes in acetylation are significantly correlated to
changes in mRNA expression and, surprisingly, this coherence increases over
time. We quantify the predictive power of histone acetylation for gene
expression changes in a balanced cross-validation procedure. In an in-depth
study we focus on genes central to the regulatory network of Mouse ESC,
including those identified in a recent genome-wide RNAi screen and in the
PluriNet, a computationally derived stem cell signature. We find that compared
to the rest of the genome, ESC-specific genes show significantly more
acetylation signal and a much stronger decrease in acetylation over time, which
is often not reflected in an concordant expression change. These results shed
light on the complexity of the relationship between histone acetylation and
gene expression and are a step forward to dissect the multilayer regulatory
mechanisms that determine stem cell fate.Comment: accepted at PLoS Computational Biolog
An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli.
Given the vast behavioral repertoire and biological complexity of even the simplest organisms, accurately predicting phenotypes in novel environments and unveiling their biological organization is a challenging endeavor. Here, we present an integrative modeling methodology that unifies under a common framework the various biological processes and their interactions across multiple layers. We trained this methodology on an extensive normalized compendium for the gram-negative bacterium Escherichia coli, which incorporates gene expression data for genetic and environmental perturbations, transcriptional regulation, signal transduction, and metabolic pathways, as well as growth measurements. Comparison with measured growth and high-throughput data demonstrates the enhanced ability of the integrative model to predict phenotypic outcomes in various environmental and genetic conditions, even in cases where their underlying functions are under-represented in the training set. This work paves the way toward integrative techniques that extract knowledge from a variety of biological data to achieve more than the sum of their parts in the context of prediction, analysis, and redesign of biological systems
- β¦