5 research outputs found

    Discovering Graphical Granger Causality Using the Truncating Lasso Penalty

    Full text link
    Components of biological systems interact with each other in order to carry out vital cell functions. Such information can be used to improve estimation and inference, and to obtain better insights into the underlying cellular mechanisms. Discovering regulatory interactions among genes is therefore an important problem in systems biology. Whole-genome expression data over time provides an opportunity to determine how the expression levels of genes are affected by changes in transcription levels of other genes, and can therefore be used to discover regulatory interactions among genes. In this paper, we propose a novel penalization method, called truncating lasso, for estimation of causal relationships from time-course gene expression data. The proposed penalty can correctly determine the order of the underlying time series, and improves the performance of the lasso-type estimators. Moreover, the resulting estimate provides information on the time lag between activation of transcription factors and their effects on regulated genes. We provide an efficient algorithm for estimation of model parameters, and show that the proposed method can consistently discover causal relationships in the large pp, small nn setting. The performance of the proposed model is evaluated favorably in simulated, as well as real, data examples. The proposed truncating lasso method is implemented in the R-package grangerTlasso and is available at http://www.stat.lsa.umich.edu/~shojaie.Comment: 12 pages, 4 figures, 1 tabl

    Inferring cluster-based networks from differently stimulated multiple time-course gene expression data

    Get PDF
    Motivation: Clustering and gene network inference often help to predict the biological functions of gene subsets. Recently, researchers have accumulated a large amount of time-course transcriptome data collected under different treatment conditions to understand the physiological states of cells in response to extracellular stimuli and to identify drug-responsive genes. Although a variety of statistical methods for clustering and inferring gene networks from expression profiles have been proposed, most of these are not tailored to simultaneously treat expression data collected under multiple stimulation conditions

    Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

    Full text link
    Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g. wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network.Comment: 24 pages, 4 figures, 6 table

    Methods for Reconstructing Networks with Incomplete Information.

    Full text link
    Network representations of complex systems are widespread and reconstructing unknown networks from data has been intensively researched in statistical and scientific communities more broadly. Two challenges in network reconstruction problems include having insufficient data to illuminate the full structure of the network and needing to combine information from different data sources. Addressing these challenges, this thesis contributes methodology for network reconstruction in three respects. First, we consider sequentially choosing interventions to discover structure in directed networks focusing on learning a partial order over the nodes. This focus leads to a new model for intervention data under which nodal variables depend on the lengths of paths separating them from intervention targets rather than on parent sets. Taking a Bayesian approach, we present partial-order based priors and develop a novel Markov-Chain Monte Carlo (MCMC) method for computing posterior expectations over directed acyclic graphs. The utility of the MCMC approach comes from designing new proposals for the Metropolis algorithm that move locally among partial orders while independently sampling graphs from each partial order. The resulting Markov Chains mix rapidly and are ergodic. We also adapt an existing strategy for active structure learning, develop an efficient Monte Carlo procedure for estimating the resulting decision function, and evaluate the proposed methods numerically using simulations and benchmark datasets. We next study penalized likelihood methods using incomplete order information as arising from intervention data. To make the notion of incomplete information precise, we introduce and formally define incomplete partial orders which subsumes the important special case of a known total ordering of the nodes. This special case lies along an information lattice and we study the reconstruction performance of penalized likelihood methods at different points along this lattice. Finally, we present a method for ranking a network's potential edges using time-course data. The novelty is our development of a nonparametric gradient-matching procedure and a related summary statistic for measuring the strength of relationships among components in dynamic systems. Simulation studies demonstrate that given sufficient signal moving using this procedure to move from linear to additive approximations leads to improved rankings of potential edges.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113316/1/jbhender_1.pd

    Modeling and Estimation of High-dimensional Vector Autoregressions.

    Full text link
    Vector Autoregression (VAR) represents a popular class of time series models in applied macroeconomics and finance, widely used for structural analysis and simultaneous forecasting of a number of temporally observed variables. Over the years it has gained popularity in the fields of control theory, statistics, economics, finance, genetics and neuroscience. In addition to the "curse of dimensionality" introduced by a quadratically growing dimension of the parameter space, VAR estimation poses considerable challenges due to the temporal and cross-sectional dependence in the data. In the first part of this thesis, we discuss modeling and estimation of high-dimensional VAR from short panels of time series, with applications to reconstruction of gene regulatory network from time course gene expression data. We investigate adaptively thresholded lasso regularized estimation of VAR models and propose a thesholded group lasso regularization framework to incorporate a priori available pathway information in the model. The properties of the proposed methods are assessed both theoretically and via numerical experiments. The study is illustrated on two motivating examples coming from functional genomics and financial econometrics. The second part of this thesis focuses on modeling and estimation of high-dimensional VAR in the traditional time series setting, where one observes a single replicate of a long, stationary time series. We investigate the theoretical properties of l1-regularized and thresholded estimators in high-dimensional VAR, stochastic regression and covariance estimation problems in a non-asymptotic framework. We establish consistency of the estimators under high-dimensional scaling and propose a measure of stability that provides insight into the effect of temporal and cross-sectional dependence on the accuracy of the regularized estimates. We also propose a low-rank plus sparse modeling strategy of high-dimensional VAR in the presence of latent variables. We study the theoretical properties of the proposed estimator in a non-asymptotic framework, establish its estimation consistency under high-dimensional scaling and compare its performance with existing methods via extensive simulation studies.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/109029/1/sumbose_1.pd
    corecore