16,595 research outputs found
On Identifying Significant Edges in Graphical Models of Molecular Networks
Objective: Modelling the associations from high-throughput experimental
molecular data has provided unprecedented insights into biological pathways and
signalling mechanisms. Graphical models and networks have especially proven to
be useful abstractions in this regard. Ad-hoc thresholds are often used in
conjunction with structure learning algorithms to determine significant
associations. The present study overcomes this limitation by proposing a
statistically-motivated approach for identifying significant associations in a
network.
Methods and Materials: A new method that identifies significant associations
in graphical models by estimating the threshold minimising the
norm between the cumulative distribution function (CDF) of the observed edge
confidences and those of its asymptotic counterpart is proposed. The
effectiveness of the proposed method is demonstrated on popular synthetic data
sets as well as publicly available experimental molecular data corresponding to
gene and protein expression profiles.
Results: The improved performance of the proposed approach is demonstrated
across the synthetic data sets using sensitivity, specificity and accuracy as
performance metrics. The results are also demonstrated across varying sample
sizes and three different structure learning algorithms with widely varying
assumptions. In all cases, the proposed approach has specificity and accuracy
close to 1, while sensitivity increases linearly in the logarithm of the sample
size. The estimated threshold systematically outperforms common ad-hoc ones in
terms of sensitivity while maintaining comparable levels of specificity and
accuracy. Networks from experimental data sets are reconstructed accurately
with respect to the results from the original papers.Comment: 21 pages, 9 figures. Presented at the Conference for Artificial
Intelligence in Medicine (AIME '11), Workshop on Probabilistic Problem
Solving in Biomedicin
Defining a robust biological prior from Pathway Analysis to drive Network Inference
Inferring genetic networks from gene expression data is one of the most
challenging work in the post-genomic era, partly due to the vast space of
possible networks and the relatively small amount of data available. In this
field, Gaussian Graphical Model (GGM) provides a convenient framework for the
discovery of biological networks. In this paper, we propose an original
approach for inferring gene regulation networks using a robust biological prior
on their structure in order to limit the set of candidate networks.
Pathways, that represent biological knowledge on the regulatory networks,
will be used as an informative prior knowledge to drive Network Inference. This
approach is based on the selection of a relevant set of genes, called the
"molecular signature", associated with a condition of interest (for instance,
the genes involved in disease development). In this context, differential
expression analysis is a well established strategy. However outcome signatures
are often not consistent and show little overlap between studies. Thus, we will
dedicate the first part of our work to the improvement of the standard process
of biomarker identification to guarantee the robustness and reproducibility of
the molecular signature.
Our approach enables to compare the networks inferred between two conditions
of interest (for instance case and control networks) and help along the
biological interpretation of results. Thus it allows to identify differential
regulations that occur in these conditions. We illustrate the proposed approach
by applying our method to a study of breast cancer's response to treatment
Application of new probabilistic graphical models in the genetic regulatory networks studies
This paper introduces two new probabilistic graphical models for
reconstruction of genetic regulatory networks using DNA microarray data. One is
an Independence Graph (IG) model with either a forward or a backward search
algorithm and the other one is a Gaussian Network (GN) model with a novel
greedy search method. The performances of both models were evaluated on four
MAPK pathways in yeast and three simulated data sets. Generally, an IG model
provides a sparse graph but a GN model produces a dense graph where more
information about gene-gene interactions is preserved. Additionally, we found
two key limitations in the prediction of genetic regulatory networks using DNA
microarray data, the first is the sufficiency of sample size and the second is
the complexity of network structures may not be captured without additional
data at the protein level. Those limitations are present in all prediction
methods which used only DNA microarray data.Comment: 38 pages, 3 figure
Modeling dependent gene expression
In this paper we propose a Bayesian approach for inference about dependence
of high throughput gene expression. Our goals are to use prior knowledge about
pathways to anchor inference about dependence among genes; to account for this
dependence while making inferences about differences in mean expression across
phenotypes; and to explore differences in the dependence itself across
phenotypes. Useful features of the proposed approach are a model-based
parsimonious representation of expression as an ordinal outcome, a novel and
flexible representation of prior information on the nature of dependencies, and
the use of a coherent probability model over both the structure and strength of
the dependencies of interest. We evaluate our approach through simulations and
in the analysis of data on expression of genes in the Complement and
Coagulation Cascade pathway in ovarian cancer.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS525 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Controlling the Precision-Recall Tradeoff in Differential Dependency Network Analysis
Graphical models have gained a lot of attention recently as a tool for
learning and representing dependencies among variables in multivariate data.
Often, domain scientists are looking specifically for differences among the
dependency networks of different conditions or populations (e.g. differences
between regulatory networks of different species, or differences between
dependency networks of diseased versus healthy populations). The standard
method for finding these differences is to learn the dependency networks for
each condition independently and compare them. We show that this approach is
prone to high false discovery rates (low precision) that can render the
analysis useless. We then show that by imposing a bias towards learning similar
dependency networks for each condition the false discovery rates can be reduced
to acceptable levels, at the cost of finding a reduced number of differences.
Algorithms developed in the transfer learning literature can be used to vary
the strength of the imposed similarity bias and provide a natural mechanism to
smoothly adjust this differential precision-recall tradeoff to cater to the
requirements of the analysis conducted. We present real case studies
(oncological and neurological) where domain experts use the proposed technique
to extract useful differential networks that shed light on the biological
processes involved in cancer and brain function
Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles
Reconstructing transcriptional regulatory networks is an important task in
functional genomics. Data obtained from experiments that perturb genes by
knockouts or RNA interference contain useful information for addressing this
reconstruction problem. However, such data can be limited in size and/or are
expensive to acquire. On the other hand, observational data of the organism in
steady state (e.g. wild-type) are more readily available, but their
informational content is inadequate for the task at hand. We develop a
computational approach to appropriately utilize both data sources for
estimating a regulatory network. The proposed approach is based on a three-step
algorithm to estimate the underlying directed but cyclic network, that uses as
input both perturbation screens and steady state gene expression data. In the
first step, the algorithm determines causal orderings of the genes that are
consistent with the perturbation data, by combining an exhaustive search method
with a fast heuristic that in turn couples a Monte Carlo technique with a fast
search algorithm. In the second step, for each obtained causal ordering, a
regulatory network is estimated using a penalized likelihood based method,
while in the third step a consensus network is constructed from the highest
scored ones. Extensive computational experiments show that the algorithm
performs well in reconstructing the underlying network and clearly outperforms
competing approaches that rely only on a single data source. Further, it is
established that the algorithm produces a consistent estimate of the regulatory
network.Comment: 24 pages, 4 figures, 6 table
- …