5,048 research outputs found
A comparative study of covariance selection models for the inference of gene regulatory networks
Display Omitted Three different models for inferring gene networks from microarray data are proposed.The most sensitive approach is selected by an exhaustive simulation study.The method reveals a cross-talk between the isoprenoid biosynthesis pathways in Arabidopsis thaliana.The method highlights 9 genes in HRAS signature regulated by the transcription factor RREB1. MotivationThe inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. ResultsIn this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) '?2C' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ?2C outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. AvailabilitySoftware implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip
Inferring causal relations from multivariate time series : a fast method for large-scale gene expression data
Various multivariate time series analysis techniques have been developed with the aim of inferring causal relations between time series. Previously, these techniques have proved their effectiveness on economic and neurophysiological data, which normally consist of hundreds of samples. However, in their applications to gene regulatory inference, the small sample size of gene expression time series poses an obstacle. In this paper, we describe some of the most commonly used multivariate inference techniques and show the potential challenge related to gene expression analysis. In response, we propose a directed partial correlation (DPC) algorithm as an efficient and effective solution to causal/regulatory relations inference on small sample gene expression data. Comparative evaluations on the existing techniques and the proposed method are presented. To draw reliable conclusions, a comprehensive benchmarking on data sets of various setups is essential. Three experiments are designed to assess these methods in a coherent manner. Detailed analysis of experimental results not only reveals good accuracy of the proposed DPC method in large-scale prediction, but also gives much insight into all methods under evaluation
A comparative study of Gaussian Graphical Model approaches for genomic data
The inference of networks of dependencies by Gaussian Graphical models on
high-throughput data is an open issue in modern molecular biology. In this
paper we provide a comparative study of three methods to obtain small sample
and high dimension estimates of partial correlation coefficients: the
Moore-Penrose pseudoinverse (PINV), residual correlation (RCM) and
covariance-regularized method . We first compare them on simulated
datasets and we find that PINV is less stable in terms of AUC performance when
the number of variables changes. The two regularized methods have comparable
performances but is much faster than RCM. Finally, we present the
results of an application of for the inference of a gene network
for isoprenoid biosynthesis pathways in Arabidopsis thaliana.Comment: 7 pages, 1 figure, RevTex4, version to appear in the proceedings of
1st International Workshop on Pattern Recognition, Proteomics, Structural
Biology and Bioinformatics: PR PS BB 2011, Ravenna, Italy, 13 September 201
Defining a robust biological prior from Pathway Analysis to drive Network Inference
Inferring genetic networks from gene expression data is one of the most
challenging work in the post-genomic era, partly due to the vast space of
possible networks and the relatively small amount of data available. In this
field, Gaussian Graphical Model (GGM) provides a convenient framework for the
discovery of biological networks. In this paper, we propose an original
approach for inferring gene regulation networks using a robust biological prior
on their structure in order to limit the set of candidate networks.
Pathways, that represent biological knowledge on the regulatory networks,
will be used as an informative prior knowledge to drive Network Inference. This
approach is based on the selection of a relevant set of genes, called the
"molecular signature", associated with a condition of interest (for instance,
the genes involved in disease development). In this context, differential
expression analysis is a well established strategy. However outcome signatures
are often not consistent and show little overlap between studies. Thus, we will
dedicate the first part of our work to the improvement of the standard process
of biomarker identification to guarantee the robustness and reproducibility of
the molecular signature.
Our approach enables to compare the networks inferred between two conditions
of interest (for instance case and control networks) and help along the
biological interpretation of results. Thus it allows to identify differential
regulations that occur in these conditions. We illustrate the proposed approach
by applying our method to a study of breast cancer's response to treatment
Inferring dynamic genetic networks with low order independencies
In this paper, we propose a novel inference method for dynamic genetic
networks which makes it possible to face with a number of time measurements n
much smaller than the number of genes p. The approach is based on the concept
of low order conditional dependence graph that we extend here in the case of
Dynamic Bayesian Networks. Most of our results are based on the theory of
graphical models associated with the Directed Acyclic Graphs (DAGs). In this
way, we define a minimal DAG G which describes exactly the full order
conditional dependencies given the past of the process. Then, to face with the
large p and small n estimation case, we propose to approximate DAG G by
considering low order conditional independencies. We introduce partial qth
order conditional dependence DAGs G(q) and analyze their probabilistic
properties. In general, DAGs G(q) differ from DAG G but still reflect relevant
dependence facts for sparse networks such as genetic networks. By using this
approximation, we set out a non-bayesian inference method and demonstrate the
effectiveness of this approach on both simulated and real data analysis. The
inference procedure is implemented in the R package 'G1DBN' freely available
from the CRAN archive
Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks
We present a procedure for effective estimation of entropy and mutual
information from small-sample data, and apply it to the problem of inferring
high-dimensional gene association networks. Specifically, we develop a
James-Stein-type shrinkage estimator, resulting in a procedure that is highly
efficient statistically as well as computationally. Despite its simplicity, we
show that it outperforms eight other entropy estimation procedures across a
diverse range of sampling scenarios and data-generating models, even in cases
of severe undersampling. We illustrate the approach by analyzing E. coli gene
expression data and computing an entropy-based gene-association network from
gene expression data. A computer program is available that implements the
proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl
- …