36,601 research outputs found
Motif Discovery through Predictive Modeling of Gene Regulation
We present MEDUSA, an integrative method for learning motif models of
transcription factor binding sites by incorporating promoter sequence and gene
expression data. We use a modern large-margin machine learning approach, based
on boosting, to enable feature selection from the high-dimensional search space
of candidate binding sequences while avoiding overfitting. At each iteration of
the algorithm, MEDUSA builds a motif model whose presence in the promoter
region of a gene, coupled with activity of a regulator in an experiment, is
predictive of differential expression. In this way, we learn motifs that are
functional and predictive of regulatory response rather than motifs that are
simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model
of the transcriptional control logic that can predict the expression of any
gene in the organism, given the sequence of the promoter region of the target
gene and the expression state of a set of known or putative transcription
factors and signaling molecules. Each motif model is either a -length
sequence, a dimer, or a PSSM that is built by agglomerative probabilistic
clustering of sequences with similar boosting loss. By applying MEDUSA to a set
of environmental stress response expression data in yeast, we learn motifs
whose ability to predict differential expression of target genes outperforms
motifs from the TRANSFAC dataset and from a previously published candidate set
of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed
binding sites associated with environmental stress response from the
literature.Comment: RECOMB 200
Predicting Genetic Regulatory Response Using Classification
We present a novel classification-based method for learning to predict gene
regulatory response. Our approach is motivated by the hypothesis that in simple
organisms such as Saccharomyces cerevisiae, we can learn a decision rule for
predicting whether a gene is up- or down-regulated in a particular experiment
based on (1) the presence of binding site subsequences (``motifs'') in the
gene's regulatory region and (2) the expression levels of regulators such as
transcription factors in the experiment (``parents''). Thus our learning task
integrates two qualitatively different data sources: genome-wide cDNA
microarray data across multiple perturbation and mutant experiments along with
motif profile data from regulatory sequences. We convert the regression task of
predicting real-valued gene expression measurement to a classification task of
predicting +1 and -1 labels, corresponding to up- and down-regulation beyond
the levels of biological and measurement noise in microarray measurements. The
learning algorithm employed is boosting with a margin-based generalization of
decision trees, alternating decision trees. This large-margin classifier is
sufficiently flexible to allow complex logical functions, yet sufficiently
simple to give insight into the combinatorial mechanisms of gene regulation. We
observe encouraging prediction accuracy on experiments based on the Gasch S.
cerevisiae dataset, and we show that we can accurately predict up- and
down-regulation on held-out experiments. Our method thus provides predictive
hypotheses, suggests biological experiments, and provides interpretable insight
into the structure of genetic regulatory networks.Comment: 8 pages, 4 figures, presented at Twelfth International Conference on
Intelligent Systems for Molecular Biology (ISMB 2004), supplemental website:
http://www.cs.columbia.edu/compbio/geneclas
Mathematical and computational modelling of post-transcriptional gene relation by micro-RNA
Mathematical models and computational simulations have proved valuable in many areas of cell biology, including gene regulatory networks. When properly calibrated against experimental data, kinetic models can be used to describe how the concentrations of key species evolve over time. A reliable model allows āwhat ifā scenarios to be investigated quantitatively in silico, and also provides a means to compare competing hypotheses about the underlying biological mechanisms at work. Moreover, models at different scales of resolution can be merged into a bigger picture āsystemsā level description. In the case where gene regulation is post-transcriptionally affected by microRNAs, biological understanding and experimental techniques have only recently matured to the extent that we can postulate and test kinetic models. In this chapter, we summarize some recent work that takes the first steps towards realistic modelling, focusing on the contributions of the authors. Using a deterministic ordinary differential equation framework, we derive models from first principles and test them for consistency with recent experimental data, including microarray and mass spectrometry measurements. We first consider typical mis-expression experiments, where the microRNA level is instantaneously boosted or depleted and thereafter remains at a fixed level. We then move on to a more general setting where the microRNA is simply treated as another species in the reaction network, with microRNA-mRNA binding forming the basis for the post-transcriptional repression. We include some speculative comments about the potential for kinetic modelling to contribute to the more widespread sequence and network based approaches in the qualitative investigation of microRNA based gene regulation. We also consider what new combinations of experimental data will be needed in order to make sense of the increased systems-level complexity introduced by microRNAs
Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing.
Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R(2 )= 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities
Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana
BACKGROUND: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation. RESULTS: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes. CONCLUSION: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks
A new procedure to analyze RNA non-branching structures
RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)
Rocaglates convert DEAD-box protein eIF4A into a sequence-selective translational repressor.
Rocaglamide A (RocA) typifies a class of protein synthesis inhibitors that selectively kill aneuploid tumour cells and repress translation of specific messenger RNAs. RocA targets eukaryotic initiation factor 4A (eIF4A), an ATP-dependent DEAD-box RNA helicase; its messenger RNA selectivity is proposed to reflect highly structured 5' untranslated regions that depend strongly on eIF4A-mediated unwinding. However, rocaglate treatment may not phenocopy the loss of eIF4A activity, as these drugs actually increase the affinity between eIF4A and RNA. Here we show that secondary structure in 5' untranslated regions is only a minor determinant for RocA selectivity and that RocA does not repress translation by reducing eIF4A availability. Rather, in vitro and in cells, RocA specifically clamps eIF4A onto polypurine sequences in an ATP-independent manner. This artificially clamped eIF4A blocks 43S scanning, leading to premature, upstream translation initiation and reducing protein expression from transcripts bearing the RocA-eIF4A target sequence. In elucidating the mechanism of selective translation repression by this lead anti-cancer compound, we provide an example of a drug stabilizing sequence-selective RNA-protein interactions
- ā¦