150 research outputs found
Wisdom of crowds for robust gene network inference
Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.National Institutes of Health (U.S.)National Centers for Biomedical Computing (U.S.) (Roadmap Initiative (U54CA121852))Howard Hughes Medical InstituteNational Institutes of Health (U.S.) (Director's Pioneer Award DPI OD003644)Swiss National Science Foundation (Fellowship
Bayesian Covariate-Dependent Quantile Directed Acyclic Graphical Models for Individualized Inference
We propose an approach termed ``qDAGx'' for Bayesian covariate-dependent
quantile directed acyclic graphs (DAGs) where these DAGs are individualized, in
the sense that they depend on individual-specific covariates. The
individualized DAG structure of the proposed approach can be uniquely
identified at any given quantile, based on purely observational data without
strong assumptions such as a known topological ordering. To scale the proposed
method to a large number of variables and covariates, we propose for the model
parameters a novel parameter expanded horseshoe prior that affords a number of
attractive theoretical and computational benefits to our approach. By modeling
the conditional quantiles, qDAGx overcomes the common limitations of mean
regression for DAGs, which can be sensitive to the choice of likelihood, e.g.,
an assumption of multivariate normality, as well as to the choice of priors. We
demonstrate the performance of qDAGx through extensive numerical simulations
and via an application in precision medicine, which infers patient-specific
protein--protein interaction networks in lung cancer.Comment: 35 pages, 5 figure
Practical Approaches to Biological Network Discovery
This dissertation addresses a current outstanding problem in the field of systems biology, which is to identify the structure of a transcriptional network from high-throughput experimental data. Understanding of the connectivity of a transcriptional network is an important piece of the puzzle, which relates the genotype of an organism to its phenotypes. An overwhelming number of computational approaches have been proposed to perform integrative analyses on large collections of high-throughput gene expression datasets to infer the structure of transcriptional networks. I put forth a methodology by which these tools can be evaluated and compared against one another to better understand their strengths and weaknesses. Next I undertake the task of utilizing high-throughput datasets to learn new and interesting network biology in the pathogenic fungus Cryptococcus neoformans. Finally I propose a novel computational method for mapping out transcriptional networks that unifies two orthogonal strategies for network inference. I apply this method to map out the transcriptional network of Saccharomyces cerevisiae and demonstrate how network inference results can complement chromatin immunoprecipitation: ChIP) experiments, which directly probe the binding events of transcriptional regulators. Collectively, my contributions improve both the accessibility and practicality of network inference methods
Inferential stability in systems biology
The modern biological sciences are fraught with statistical difficulties. Biomolecular
stochasticity, experimental noise, and the “large p, small n” problem all contribute to
the challenge of data analysis. Nevertheless, we routinely seek to draw robust, meaningful
conclusions from observations. In this thesis, we explore methods for assessing
the effects of data variability upon downstream inference, in an attempt to quantify and
promote the stability of the inferences we make.
We start with a review of existing methods for addressing this problem, focusing upon the
bootstrap and similar methods. The key requirement for all such approaches is a statistical
model that approximates the data generating process.
We move on to consider biomarker discovery problems. We present a novel algorithm for
proposing putative biomarkers on the strength of both their predictive ability and the stability
with which they are selected. In a simulation study, we find our approach to perform
favourably in comparison to strategies that select on the basis of predictive performance
alone.
We then consider the real problem of identifying protein peak biomarkers for HAM/TSP,
an inflammatory condition of the central nervous system caused by HTLV-1 infection.
We apply our algorithm to a set of SELDI mass spectral data, and identify a number of
putative biomarkers. Additional experimental work, together with known results from the
literature, provides corroborating evidence for the validity of these putative biomarkers.
Having focused on static observations, we then make the natural progression to time
course data sets. We propose a (Bayesian) bootstrap approach for such data, and then
apply our method in the context of gene network inference and the estimation of parameters
in ordinary differential equation models. We find that the inferred gene networks
are relatively unstable, and demonstrate the importance of finding distributions of ODE
parameter estimates, rather than single point estimates
Recommended from our members
Novel methods for biological network inference: an application to circadian Ca2+ signaling network
Biological processes involve complex biochemical interactions among a large number of species like cells, RNA, proteins and metabolites. Learning these interactions is essential to interfering artificially with biological processes in order to, for example, improve crop yield, develop new therapies, and predict new cell or organism behaviors to genetic or environmental perturbations. For a biological process, two pieces of information are of most interest. For a particular species, the first step is to learn which other species are regulating it. This reveals topology and causality. The second step involves learning the precise mechanisms of how this regulation occurs. This step reveals the dynamics of the system. Applying this process to all species leads to the complete dynamical network. Systems biology is making considerable efforts to learn biological networks at low experimental costs. The main goal of this thesis is to develop advanced methods to build models for biological networks, taking the circadian system of Arabidopsis thaliana as a case study. A variety of network inference approaches have been proposed in the literature to study dynamic biological networks. However, many successful methods either require prior knowledge of the system or focus more on topology. This thesis presents novel methods that identify both network topology and dynamics, and do not depend on prior knowledge. Hence, the proposed methods are applicable to general biological networks. These methods are initially developed for linear systems, and, at the cost of higher computational complexity, can also be applied to nonlinear systems. Overall, we propose four methods with increasing computational complexity: one-to-one, combined group and element sparse Bayesian learning (GESBL), the kernel method and reversible jump Markov chain Monte Carlo method (RJMCMC). All methods are tested with challenging dynamical network simulations (including feedback, random networks, different levels of noise and number of samples), and realistic models of circadian system of Arabidopsis thaliana. These simulations show that, while the one-to-one method scales to the whole genome, the kernel method and RJMCMC method are superior for smaller networks. They are robust to tuning variables and able to provide stable performance. The simulations also imply the advantage of GESBL and RJMCMC over the state-of-the-art method. We envision that the estimated models can benefit a wide range of research. For example, they can locate biological compounds responsible for human disease through mathematical analysis and help predict the effectiveness of new treatments
Recommended from our members
Understanding transcriptional regulation through computational analysis of single-cell transcriptomics
Gene expression is tightly regulated by complex transcriptional regulatory mechanisms to achieve specific expression patterns, which are essential to facilitate important biological processes such as embryonic development. Dysregulation of gene expression can lead to diseases such as cancers. A better understanding of the transcriptional regulation will therefore not only advance the understanding of fundamental biological processes, but also provide mechanistic insights into diseases.
The earlier versions of high-throughput expression profiling techniques were limited to measuring average gene expression across large pools of cells. In contrast, recent technological improvements have made it possible to perform expression profiling in single cells. Single-cell expression profiling is able to capture heterogeneity among single cells, which is not possible in conventional bulk expression profiling.
In my PhD, I focus on developing new algorithms, as well as benchmarking and utilising existing algorithms to study the transcriptomes of various biological systems using single-cell expression data. I have developed two different single-cell specific network inference algorithms, BTR and SPVAR, which are based on two different formalisms, Boolean and autoregression frameworks respectively. BTR was shown to be useful for improving existing Boolean models with single-cell expression data, while SPVAR was shown to be a conservative predictor of gene interactions using pseudotime-ordered single-cell expression data.
In addition, I have obtained novel biological insights by analysing single-cell RNAseq data from the epiblast stem cells reprogramming and the leukaemia systems. Three different driver genes, namely Esrrb, Klf2 and GY118F, were shown to drive reprogramming of epiblast stem cells via different reprogramming routes. As for the leukaemia system, FLT3-ITD and IDH1-R132H mutations were shown to interact with each other and potentially predispose some cells for developing acute myeloid leukaemia.Wellcome Trust and Cambridge Trus
Statistical inference in mechanistic models: time warping for improved gradient matching
Inference in mechanistic models of non-linear differential equations is a challenging problem in current computational statistics. Due to the high computational costs of numerically solving the differential equations in every step of an iterative parameter adaptation scheme, approximate methods based on gradient matching have become popular. However, these methods critically depend on the smoothing scheme for function interpolation. The present article adapts an idea from manifold learning and demonstrates that a time warping approach aiming to homogenize intrinsic length scales can lead to a significant improvement in parameter estimation accuracy. We demonstrate the effectiveness of this scheme on noisy data from two dynamical systems with periodic limit cycle, a biopathway, and an application from soft-tissue mechanics. Our study also provides a comparative evaluation on a wide range of signal-to-noise ratios
- …