1,448 research outputs found
Grouped graphical Granger modeling for gene expression regulatory networks discovery
We consider the problem of discovering gene regulatory networks from time-series microarray data. Recently, graphical Granger modeling has gained considerable attention as a promising direction for addressing this problem. These methods apply graphical modeling methods on time-series data and invoke the notion of āGranger causalityā to make assertions on causality through inference on time-lagged effects. Existing algorithms, however, have neglected an important aspect of the problemāthe group structure among the lagged temporal variables naturally imposed by the time series they belong to. Specifically, existing methods in computational biology share this shortcoming, as well as additional computational limitations, prohibiting their effective applications to the large datasets including a large number of genes and many data points. In the present article, we propose a novel methodology which we term āgrouped graphical Granger modeling methodā, which overcomes the limitations mentioned above by applying a regression method suited for high-dimensional and large data, and by leveraging the group structure among the lagged temporal variables according to the time series they belong to. We demonstrate the effectiveness of the proposed methodology on both simulated and actual gene expression data, specifically the human cancer cell (HeLa S3) cycle data. The simulation results show that the proposed methodology generally exhibits higher accuracy in recovering the underlying causal structure. Those on the gene expression data demonstrate that it leads to improved accuracy with respect to prediction of known links, and also uncovers additional causal relationships uncaptured by earlier works
Longitudinal LASSO: Jointly Learning Features and Temporal Contingency for Outcome Prediction
Longitudinal analysis is important in many disciplines, such as the study of
behavioral transitions in social science. Only very recently, feature selection
has drawn adequate attention in the context of longitudinal modeling. Standard
techniques, such as generalized estimating equations, have been modified to
select features by imposing sparsity-inducing regularizers. However, they do
not explicitly model how a dependent variable relies on features measured at
proximal time points. Recent graphical Granger modeling can select features in
lagged time points but ignores the temporal correlations within an individual's
repeated measurements. We propose an approach to automatically and
simultaneously determine both the relevant features and the relevant temporal
points that impact the current outcome of the dependent variable. Meanwhile,
the proposed model takes into account the non-{\em i.i.d} nature of the data by
estimating the within-individual correlations. This approach decomposes model
parameters into a summation of two components and imposes separate block-wise
LASSO penalties to each component when building a linear model in terms of the
past measurements of features. One component is used to select features
whereas the other is used to select temporal contingent points. An accelerated
gradient descent algorithm is developed to efficiently solve the related
optimization problem with detailed convergence analysis and asymptotic
analysis. Computational results on both synthetic and real world problems
demonstrate the superior performance of the proposed approach over existing
techniques.Comment: Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, 201
Discovering Graphical Granger Causality Using the Truncating Lasso Penalty
Components of biological systems interact with each other in order to carry
out vital cell functions. Such information can be used to improve estimation
and inference, and to obtain better insights into the underlying cellular
mechanisms. Discovering regulatory interactions among genes is therefore an
important problem in systems biology. Whole-genome expression data over time
provides an opportunity to determine how the expression levels of genes are
affected by changes in transcription levels of other genes, and can therefore
be used to discover regulatory interactions among genes.
In this paper, we propose a novel penalization method, called truncating
lasso, for estimation of causal relationships from time-course gene expression
data. The proposed penalty can correctly determine the order of the underlying
time series, and improves the performance of the lasso-type estimators.
Moreover, the resulting estimate provides information on the time lag between
activation of transcription factors and their effects on regulated genes. We
provide an efficient algorithm for estimation of model parameters, and show
that the proposed method can consistently discover causal relationships in the
large , small setting. The performance of the proposed model is
evaluated favorably in simulated, as well as real, data examples. The proposed
truncating lasso method is implemented in the R-package grangerTlasso and is
available at http://www.stat.lsa.umich.edu/~shojaie.Comment: 12 pages, 4 figures, 1 tabl
Forecasting and Granger Modelling with Non-linear Dynamical Dependencies
Traditional linear methods for forecasting multivariate time series are not
able to satisfactorily model the non-linear dependencies that may exist in
non-Gaussian series. We build on the theory of learning vector-valued functions
in the reproducing kernel Hilbert space and develop a method for learning
prediction functions that accommodate such non-linearities. The method not only
learns the predictive function but also the matrix-valued kernel underlying the
function search space directly from the data. Our approach is based on learning
multiple matrix-valued kernels, each of those composed of a set of input
kernels and a set of output kernels learned in the cone of positive
semi-definite matrices. In addition to superior predictive performance in the
presence of strong non-linearities, our method also recovers the hidden dynamic
relationships between the series and thus is a new alternative to existing
graphical Granger techniques.Comment: Accepted for ECML-PKDD 201
Sparse Vector Autoregressive Modeling
The vector autoregressive (VAR) model has been widely used for modeling
temporal dependence in a multivariate time series. For large (and even
moderate) dimensions, the number of AR coefficients can be prohibitively large,
resulting in noisy estimates, unstable predictions and difficult-to-interpret
temporal dependence. To overcome such drawbacks, we propose a 2-stage approach
for fitting sparse VAR (sVAR) models in which many of the AR coefficients are
zero. The first stage selects non-zero AR coefficients based on an estimate of
the partial spectral coherence (PSC) together with the use of BIC. The PSC is
useful for quantifying the conditional relationship between marginal series in
a multivariate process. A refinement second stage is then applied to further
reduce the number of parameters. The performance of this 2-stage approach is
illustrated with simulation results. The 2-stage approach is also applied to
two real data examples: the first is the Google Flu Trends data and the second
is a time series of concentration levels of air pollutants.Comment: 39 pages, 7 figure
- ā¦