5,735 research outputs found
Assessing the impact of non-additive noise on modelling transcriptional regulation with Gaussian processes
In transcriptional regulation, transcription factors (TFs) are often
unobservable at mRNA level or may be controlled outside of the system being
modelled. Gaussian processes are a promising approach for dealing with these
difficulties as a prior distribution can be defined over the latent TF activity
profiles and the posterior distribution inferred from the observed expression levels
of potential target genes. However previous approaches have been based on the
assumption of additive Gaussian noise to maintain analytical tractability. We
investigate the influence of a more realistic form of noise on a biologically accurate
system based on Michaelis-Menten kinetics
Modelling transcriptional regulation with Gaussian processes
A challenging problem in systems biology is the quantitative modelling
of transcriptional regulation. Transcription factors (TFs), which are the
key proteins at the centre of the regulatory processes, may be subject
to post-translational modification, rendering them unobservable at the
mRNA level, or they may be controlled outside of the subsystem being
modelled. In both cases, a mechanistic model description of the regula-
tory system needs to be able to deal with latent activity profiles of the key
regulators. A promising approach to deal with these difficulties is based
on using Gaussian processes to define a prior distribution over the latent
TF activity profiles. Inference is based on the principles of non-parametric
Bayesian statistics, consistently inferring the posterior distribution of the
unknown TF activities from the observed expression levels of potential
target genes. The present work provides explicit solutions to the differ-
ential equations needed to model the data in this manner, as well as the
derivatives needed for effective optimisation. The work further explores
identifiability issues not fully shown in previous work and looks at how
this can cause difficulties with inference. We subsequently look at how the
method works on two different TFs, including looking at how the model
works with a more biologically realistic mechanistic model. Finally we
analyse the effect of more biologically realistic non-Gaussian noise on the
biologically realistic model showing how this can cause a reduction in the
accuracy of the inference
Kinetic modelling of competition and depletion of shared miRNAs by competing endogenous RNAs
Non-conding RNAs play a key role in the post-transcriptional regulation of
mRNA translation and turnover in eukaryotes. miRNAs, in particular, interact
with their target RNAs through protein-mediated, sequence-specific binding,
giving rise to extended and highly heterogeneous miRNA-RNA interaction
networks. Within such networks, competition to bind miRNAs can generate an
effective positive coupling between their targets. Competing endogenous RNAs
(ceRNAs) can in turn regulate each other through miRNA-mediated crosstalk.
Albeit potentially weak, ceRNA interactions can occur both dynamically,
affecting e.g. the regulatory clock, and at stationarity, in which case ceRNA
networks as a whole can be implicated in the composition of the cell's
proteome. Many features of ceRNA interactions, including the conditions under
which they become significant, can be unraveled by mathematical and in silico
models. We review the understanding of the ceRNA effect obtained within such
frameworks, focusing on the methods employed to quantify it, its role in the
processing of gene expression noise, and how network topology can determine its
reach.Comment: review article, 29 pages, 7 figure
Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks
Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets.
Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses.
Availability: The methods outlined in this article have been implemented in Matlab and are available on request
Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes
Multiple biological processes are driven by oscillatory gene expression at
different time scales. Pulsatile dynamics are thought to be widespread, and
single-cell live imaging of gene expression has lead to a surge of dynamic,
possibly oscillatory, data for different gene networks. However, the regulation
of gene expression at the level of an individual cell involves reactions
between finite numbers of molecules, and this can result in inherent randomness
in expression dynamics, which blurs the boundaries between aperiodic
fluctuations and noisy oscillators. Thus, there is an acute need for an
objective statistical method for classifying whether an experimentally derived
noisy time series is periodic. Here we present a new data analysis method that
combines mechanistic stochastic modelling with the powerful methods of
non-parametric regression with Gaussian processes. Our method can distinguish
oscillatory gene expression from random fluctuations of non-oscillatory
expression in single-cell time series, despite peak-to-peak variability in
period and amplitude of single-cell oscillations. We show that our method
outperforms the Lomb-Scargle periodogram in successfully classifying cells as
oscillatory or non-oscillatory in data simulated from a simple genetic
oscillator model and in experimental data. Analysis of bioluminescent live cell
imaging shows a significantly greater number of oscillatory cells when
luciferase is driven by a {\it Hes1} promoter (10/19), which has previously
been reported to oscillate, than the constitutive MoMuLV 5' LTR (MMLV) promoter
(0/25). The method can be applied to data from any gene network to both
quantify the proportion of oscillating cells within a population and to measure
the period and quality of oscillations. It is publicly available as a MATLAB
package.Comment: 36 pages, 17 figure
Bayesian correlated clustering to integrate multiple datasets
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets.
Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods
- …