32,357 research outputs found
Motif Discovery through Predictive Modeling of Gene Regulation
We present MEDUSA, an integrative method for learning motif models of
transcription factor binding sites by incorporating promoter sequence and gene
expression data. We use a modern large-margin machine learning approach, based
on boosting, to enable feature selection from the high-dimensional search space
of candidate binding sequences while avoiding overfitting. At each iteration of
the algorithm, MEDUSA builds a motif model whose presence in the promoter
region of a gene, coupled with activity of a regulator in an experiment, is
predictive of differential expression. In this way, we learn motifs that are
functional and predictive of regulatory response rather than motifs that are
simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model
of the transcriptional control logic that can predict the expression of any
gene in the organism, given the sequence of the promoter region of the target
gene and the expression state of a set of known or putative transcription
factors and signaling molecules. Each motif model is either a -length
sequence, a dimer, or a PSSM that is built by agglomerative probabilistic
clustering of sequences with similar boosting loss. By applying MEDUSA to a set
of environmental stress response expression data in yeast, we learn motifs
whose ability to predict differential expression of target genes outperforms
motifs from the TRANSFAC dataset and from a previously published candidate set
of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed
binding sites associated with environmental stress response from the
literature.Comment: RECOMB 200
Improved Network Performance via Antagonism: From Synthetic Rescues to Multi-drug Combinations
Recent research shows that a faulty or sub-optimally operating metabolic
network can often be rescued by the targeted removal of enzyme-coding
genes--the exact opposite of what traditional gene therapy would suggest.
Predictions go as far as to assert that certain gene knockouts can restore the
growth of otherwise nonviable gene-deficient cells. Many questions follow from
this discovery: What are the underlying mechanisms? How generalizable is this
effect? What are the potential applications? Here, I will approach these
questions from the perspective of compensatory perturbations on networks.
Relations will be drawn between such synthetic rescues and naturally occurring
cascades of reaction inactivation, as well as their analogues in physical and
other biological networks. I will specially discuss how rescue interactions can
lead to the rational design of antagonistic drug combinations that select
against resistance and how they can illuminate medical research on cancer,
antibiotics, and metabolic diseases.Comment: Online Open "Problems and Paradigms" articl
Predicting Genetic Regulatory Response Using Classification
We present a novel classification-based method for learning to predict gene
regulatory response. Our approach is motivated by the hypothesis that in simple
organisms such as Saccharomyces cerevisiae, we can learn a decision rule for
predicting whether a gene is up- or down-regulated in a particular experiment
based on (1) the presence of binding site subsequences (``motifs'') in the
gene's regulatory region and (2) the expression levels of regulators such as
transcription factors in the experiment (``parents''). Thus our learning task
integrates two qualitatively different data sources: genome-wide cDNA
microarray data across multiple perturbation and mutant experiments along with
motif profile data from regulatory sequences. We convert the regression task of
predicting real-valued gene expression measurement to a classification task of
predicting +1 and -1 labels, corresponding to up- and down-regulation beyond
the levels of biological and measurement noise in microarray measurements. The
learning algorithm employed is boosting with a margin-based generalization of
decision trees, alternating decision trees. This large-margin classifier is
sufficiently flexible to allow complex logical functions, yet sufficiently
simple to give insight into the combinatorial mechanisms of gene regulation. We
observe encouraging prediction accuracy on experiments based on the Gasch S.
cerevisiae dataset, and we show that we can accurately predict up- and
down-regulation on held-out experiments. Our method thus provides predictive
hypotheses, suggests biological experiments, and provides interpretable insight
into the structure of genetic regulatory networks.Comment: 8 pages, 4 figures, presented at Twelfth International Conference on
Intelligent Systems for Molecular Biology (ISMB 2004), supplemental website:
http://www.cs.columbia.edu/compbio/geneclas
- …