879 research outputs found

    Modeling gene expression regulatory networks with the sparse vector autoregressive model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems.</p> <p>Results</p> <p>We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets.</p> <p>Conclusion</p> <p>The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any <it>a priori </it>information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.</p

    Discovering Graphical Granger Causality Using the Truncating Lasso Penalty

    Full text link
    Components of biological systems interact with each other in order to carry out vital cell functions. Such information can be used to improve estimation and inference, and to obtain better insights into the underlying cellular mechanisms. Discovering regulatory interactions among genes is therefore an important problem in systems biology. Whole-genome expression data over time provides an opportunity to determine how the expression levels of genes are affected by changes in transcription levels of other genes, and can therefore be used to discover regulatory interactions among genes. In this paper, we propose a novel penalization method, called truncating lasso, for estimation of causal relationships from time-course gene expression data. The proposed penalty can correctly determine the order of the underlying time series, and improves the performance of the lasso-type estimators. Moreover, the resulting estimate provides information on the time lag between activation of transcription factors and their effects on regulated genes. We provide an efficient algorithm for estimation of model parameters, and show that the proposed method can consistently discover causal relationships in the large pp, small nn setting. The performance of the proposed model is evaluated favorably in simulated, as well as real, data examples. The proposed truncating lasso method is implemented in the R-package grangerTlasso and is available at http://www.stat.lsa.umich.edu/~shojaie.Comment: 12 pages, 4 figures, 1 tabl

    Network estimation in State Space Model with L1-regularization constraint

    Full text link
    Biological networks have arisen as an attractive paradigm of genomic science ever since the introduction of large scale genomic technologies which carried the promise of elucidating the relationship in functional genomics. Microarray technologies coupled with appropriate mathematical or statistical models have made it possible to identify dynamic regulatory networks or to measure time course of the expression level of many genes simultaneously. However one of the few limitations fall on the high-dimensional nature of such data coupled with the fact that these gene expression data are known to include some hidden process. In that regards, we are concerned with deriving a method for inferring a sparse dynamic network in a high dimensional data setting. We assume that the observations are noisy measurements of gene expression in the form of mRNAs, whose dynamics can be described by some unknown or hidden process. We build an input-dependent linear state space model from these hidden states and demonstrate how an incorporated L1L_{1} regularization constraint in an Expectation-Maximization (EM) algorithm can be used to reverse engineer transcriptional networks from gene expression profiling data. This corresponds to estimating the model interaction parameters. The proposed method is illustrated on time-course microarray data obtained from a well established T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4, CASP4, CD69, and C3X1 to have higher number of inwards directed connections and FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed connections. We recommend these genes to be object for further investigation. Caspase 4 is also found to activate the expression of JunD which in turn represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359

    Stochastic dynamic modeling of short gene expression time-series data

    Get PDF
    Copyright [2008] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.In this paper, the expectation maximization (EM) algorithm is applied for modeling the gene regulatory network from gene time-series data. The gene regulatory network is viewed as a stochastic dynamic model, which consists of the noisy gene measurement from microarray and the gene regulation first-order autoregressive (AR) stochastic dynamic process. By using the EM algorithm, both the model parameters and the actual values of the gene expression levels can be identified simultaneously. Moreover, the algorithm can deal with the sparse parameter identification and the noisy data in an efficient way. It is also shown that the EM algorithm can handle the microarray gene expression data with large number of variables but a small number of observations. The gene expression stochastic dynamic models for four real-world gene expression data sets are constructed to demonstrate the advantages of the introduced algorithm. Several indices are proposed to evaluate the models of inferred gene regulatory networks, and the relevant biological properties are discussed

    Modeling and identification of gene regulatory networks: A Granger causality approach

    Get PDF
    It is of increasing interest in systems biology to discover gene regulatory networks (GRNs) from time-series genomic data, i.e., to explore the interactions among a large number of genes and gene products over time. Currently, one common approach is based on Granger causality, which models the time-series genomic data as a vector autoregressive (VAR) process and estimates the GRNs from the VAR coefficient matrix. The main challenge for identification of VAR models is the high dimensionality of genes and limited number of time points, which results in statistically inefficient solution and high computational complexity. Therefore, fast and efficient variable selection techniques are highly desirable. In this paper, an introductory review of identification methods and variable selection techniques for VAR models in learning the GRNs will be presented. Furthermore, a dynamic VAR (DVAR) model, which accounts for dynamic GRNs changing with time during the experimental cycle, and its identification methods are introduced. © 2010 IEEE.published_or_final_versionThe 9th International Conference on Machine Learning and Cybernetics (ICMLC 2010), Qingdao, China, 11-14 July 2010. In Proceedings of the 9th ICMLC, 2010, v. 6, p. 3073-307

    Weighted-Lasso for Structured Network Inference from Time Course Data

    Full text link
    We present a weighted-Lasso method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own a prior internal structure of connectivity which drives the inference method. This prior structure can be either derived from prior biological knowledge or inferred by the method itself. We illustrate the performance of this structure-based penalization both on synthetic data and on two canonical regulatory networks, first yeast cell cycle regulation network by analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network by analysing U. Alon's lab data

    Bayesian regularization of non-homogeneous dynamic Bayesian networks by globally coupling interaction parameters

    Get PDF
    To relax the homogeneity assumption of classical dynamic Bayesian networks (DBNs), various recent studies have combined DBNs with multiple changepoint processes. The underlying assumption is that the parameters associated with time series segments delimited by multiple changepoints are a priori independent. Under weak regularity conditions, the parameters can be integrated out in the likelihood, leading to a closed-form expression of the marginal likelihood. However, the assumption of prior independence is unrealistic in many real-world applications, where the segment-specific regulatory relationships among the interdependent quantities tend to undergo gradual evolutionary adaptations. We therefore propose a Bayesian coupling scheme to introduce systematic information sharing among the segment-specific interaction parameters. We investigate the effect this model improvement has on the network reconstruction accuracy in a reverse engineering context, where the objective is to learn the structure of a gene regulatory network from temporal gene expression profiles

    Identifying interactions in the time and frequency domains in local and global networks : a Granger causality approach

    Get PDF
    Background Reverse-engineering approaches such as Bayesian network inference, ordinary differential equations (ODEs) and information theory are widely applied to deriving causal relationships among different elements such as genes, proteins, metabolites, neurons, brain areas and so on, based upon multi-dimensional spatial and temporal data. There are several well-established reverse-engineering approaches to explore causal relationships in a dynamic network, such as ordinary differential equations (ODE), Bayesian networks, information theory and Granger Causality. Results Here we focused on Granger causality both in the time and frequency domain and in local and global networks, and applied our approach to experimental data (genes and proteins). For a small gene network, Granger causality outperformed all the other three approaches mentioned above. A global protein network of 812 proteins was reconstructed, using a novel approach. The obtained results fitted well with known experimental findings and predicted many experimentally testable results. In addition to interactions in the time domain, interactions in the frequency domain were also recovered. Conclusions The results on the proteomic data and gene data confirm that Granger causality is a simple and accurate approach to recover the network structure. Our approach is general and can be easily applied to other types of temporal data
    corecore