127,428 research outputs found
Bayesian variable selection and data integration for biological regulatory networks
A substantial focus of research in molecular biology are gene regulatory
networks: the set of transcription factors and target genes which control the
involvement of different biological processes in living cells. Previous
statistical approaches for identifying gene regulatory networks have used gene
expression data, ChIP binding data or promoter sequence data, but each of these
resources provides only partial information. We present a Bayesian hierarchical
model that integrates all three data types in a principled variable selection
framework. The gene expression data are modeled as a function of the unknown
gene regulatory network which has an informed prior distribution based upon
both ChIP binding and promoter sequence data. We also present a variable
weighting methodology for the principled balancing of multiple sources of prior
information. We apply our procedure to the discovery of gene regulatory
relationships in Saccharomyces cerevisiae (Yeast) for which we can use several
external sources of information to validate our results. Our inferred
relationships show greater biological relevance on the external validation
measures than previous data integration methods. Our model also estimates
synergistic and antagonistic interactions between transcription factors, many
of which are validated by previous studies. We also evaluate the results from
our procedure for the weighting for multiple sources of prior information.
Finally, we discuss our methodology in the context of previous approaches to
data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Posterior Probability Approach for Gene Regulatory Network Inference in Genetic Perturbation Data
Inferring gene regulatory networks is an important problem in systems
biology. However, these networks can be hard to infer from experimental data
because of the inherent variability in biological data as well as the large
number of genes involved. We propose a fast, simple method for inferring
regulatory relationships between genes from knockdown experiments in the NIH
LINCS dataset by calculating posterior probabilities, incorporating prior
information. We show that the method is able to find previously identified
edges from TRANSFAC and JASPAR and discuss the merits and limitations of this
approach
A model for gene deregulation detection using expression data
In tumoral cells, gene regulation mechanisms are severely altered, and these
modifications in the regulations may be characteristic of different subtypes of
cancer. However, these alterations do not necessarily induce differential
expressions between the subtypes. To answer this question, we propose a
statistical methodology to identify the misregulated genes given a reference
network and gene expression data. Our model is based on a regulatory process in
which all genes are allowed to be deregulated. We derive an EM algorithm where
the hidden variables correspond to the status (under/over/normally expressed)
of the genes and where the E-step is solved thanks to a message passing
algorithm. Our procedure provides posterior probabilities of deregulation in a
given sample for each gene. We assess the performance of our method by
numerical experiments on simulations and on a bladder cancer data set
Weighted-Lasso for Structured Network Inference from Time Course Data
We present a weighted-Lasso method to infer the parameters of a first-order
vector auto-regressive model that describes time course expression data
generated by directed gene-to-gene regulation networks. These networks are
assumed to own a prior internal structure of connectivity which drives the
inference method. This prior structure can be either derived from prior
biological knowledge or inferred by the method itself. We illustrate the
performance of this structure-based penalization both on synthetic data and on
two canonical regulatory networks, first yeast cell cycle regulation network by
analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network
by analysing U. Alon's lab data
Sparse regulatory networks
In many organisms the expression levels of each gene are controlled by the
activation levels of known "Transcription Factors" (TF). A problem of
considerable interest is that of estimating the "Transcription Regulation
Networks" (TRN) relating the TFs and genes. While the expression levels of
genes can be observed, the activation levels of the corresponding TFs are
usually unknown, greatly increasing the difficulty of the problem. Based on
previous experimental work, it is often the case that partial information about
the TRN is available. For example, certain TFs may be known to regulate a given
gene or in other cases a connection may be predicted with a certain
probability. In general, the biology of the problem indicates there will be
very few connections between TFs and genes. Several methods have been proposed
for estimating TRNs. However, they all suffer from problems such as unrealistic
assumptions about prior knowledge of the network structure or computational
limitations. We propose a new approach that can directly utilize prior
information about the network structure in conjunction with observed gene
expression data to estimate the TRN. Our approach uses penalties on the
network to ensure a sparse structure. This has the advantage of being
computationally efficient as well as making many fewer assumptions about the
network structure. We use our methodology to construct the TRN for E. coli and
show that the estimate is biologically sensible and compares favorably with
previous estimates.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS350 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …