47 research outputs found
Efficient Algorithms for Probing the RNA Mutation Landscape
The diversity and importance of the role played by RNAs in the regulation and development of the cell are now well-known and well-documented. This broad range of functions is achieved through specific structures that have been (presumably) optimized through evolution. State-of-the-art methods, such as McCaskill's algorithm, use a statistical mechanics framework based on the computation of the partition function over the canonical ensemble of all possible secondary structures on a given sequence. Although secondary structure predictions from thermodynamics-based algorithms are not as accurate as methods employing comparative genomics, the former methods are the only available tools to investigate novel RNAs, such as the many RNAs of unknown function recently reported by the ENCODE consortium. In this paper, we generalize the McCaskill partition function algorithm to sum over the grand canonical ensemble of all secondary structures of all mutants of the given sequence. Specifically, our new program, RNAmutants, simultaneously computes for each integer k the minimum free energy structure MFE(k) and the partition function Z(k) over all secondary structures of all k-point mutants, even allowing the user to specify certain positions required not to mutate and certain positions required to base-pair or remain unpaired. This technically important extension allows us to study the resilience of an RNA molecule to pointwise mutations. By computing the mutation profile of a sequence, a novel graphical representation of the mutational tendency of nucleotide positions, we analyze the deleterious nature of mutating specific nucleotide positions or groups of positions. We have successfully applied RNAmutants to investigate deleterious mutations (mutations that radically modify the secondary structure) in the Hepatitis C virus cis-acting replication element and to evaluate the evolutionary pressure applied on different regions of the HIV trans-activation response element. In particular, we show qualitative agreement between published Hepatitis C and HIV experimental mutagenesis studies and our analysis of deleterious mutations using RNAmutants. Our work also predicts other deleterious mutations, which could be verified experimentally. Finally, we provide evidence that the 3β² UTR of the GB RNA virus C has been optimized to preserve evolutionarily conserved stem regions from a deleterious effect of pointwise mutations. We hope that there will be long-term potential applications of RNAmutants in de novo RNA design and drug design against RNA viruses. This work also suggests potential applications for large-scale exploration of the RNA sequence-structure network. Binary distributions are available at http://RNAmutants.csail.mit.edu/
High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur
Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3β²UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5β² completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.
CSMET: Comparative Genomic Motif Detection via Multi-Resolution Phylogenetic Shadowing
Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders
Characterization of Transcription from TATA-Less Promoters: Identification of a New Core Promoter Element XCPE2 and Analysis of Factor Requirements
More than 80% of mammalian protein-coding genes are driven by TATA-less promoters which often show multiple transcriptional start sites (TSSs). However, little is known about the core promoter DNA sequences or mechanisms of transcriptional initiation for this class of promoters.Here we identify a new core promoter element XCPE2 (X core promoter element 2) (consensus sequence: A/C/G-C-C/T-C-G/A-T-T-G/A-C-C/A(+1)-C/T) that can direct specific transcription from the second TSS of hepatitis B virus X gene mRNA. XCPE2 sequences can also be found in human promoter regions and typically appear to drive one of the start sites within multiple TSS-containing TATA-less promoters. To gain insight into mechanisms of transcriptional initiation from this class of promoters, we examined requirements of several general transcription factors by in vitro transcription experiments using immunodepleted nuclear extracts and purified factors. Our results show that XCPE2-driven transcription uses at least TFIIB, either TFIID or free TBP, RNA polymerase II (RNA pol II) and the MED26-containing mediator complex but not Gcn5. Therefore, XCPE2-driven transcription can be carried out by a mechanism which differs from previously described TAF-dependent mechanisms for initiator (Inr)- or downstream promoter element (DPE)-containing promoters, the TBP- and SAGA (Spt-Ada-Gcn5-acetyltransferase)-dependent mechanism for yeast TATA-containing promoters, or the TFTC (TBP-free-TAF-containing complex)-dependent mechanism for certain Inr-containing TATA-less promoters. EMSA assays using XCPE2 promoter and purified factors further suggest that XCPE2 promoter recognition requires a set of factors different from those for TATA box, Inr, or DPE promoter recognition.We identified a new core promoter element XCPE2 that are found in multiple TSS-containing TATA-less promoters. Mechanisms of promoter recognition and transcriptional initiation for XCPE2-driven promoters appear different from previously shown mechanisms for classical promoters that show single "focused" TSSs. Our studies provide insight into novel mechanisms of RNA Pol II transcription from multiple TSS-containing TATA-less promoters
Quantitative Epistasis Analysis and Pathway Inference from Genetic Interaction Data
Inferring regulatory and metabolic network models from quantitative genetic interaction data remains a major challenge in systems biology. Here, we present a novel quantitative model for interpreting epistasis within pathways responding to an external signal. The model provides the basis of an experimental method to determine the architecture of such pathways, and establishes a new set of rules to infer the order of genes within them. The method also allows the extraction of quantitative parameters enabling a new level of information to be added to genetic network models. It is applicable to any system where the impact of combinatorial loss-of-function mutations can be quantified with sufficient accuracy. We test the method by conducting a systematic analysis of a thoroughly characterized eukaryotic gene network, the galactose utilization pathway in Saccharomyces cerevisiae. For this purpose, we quantify the effects of single and double gene deletions on two phenotypic traits, fitness and reporter gene expression. We show that applying our method to fitness traits reveals the order of metabolic enzymes and the effects of accumulating metabolic intermediates. Conversely, the analysis of expression traits reveals the order of transcriptional regulatory genes, secondary regulatory signals and their relative strength. Strikingly, when the analyses of the two traits are combined, the method correctly infers βΌ80% of the known relationships without any false positives
Transcriptional Regulation: Effects of Promoter Proximal Pausing on Speed, Synchrony and Reliability
Recent whole genome polymerase binding assays in the Drosophila embryo have shown that a substantial proportion of uninduced genes have pre-assembled RNA polymerase-II transcription initiation complex (PIC) bound to their promoters. These constitute a subset of promoter proximally paused genes for which mRNA elongation instead of promoter access is regulated. This difference can be described as a rearrangement of the regulatory topology to control the downstream transcriptional process of elongation rather than the upstream transcriptional initiation event. It has been shown experimentally that genes with the former mode of regulation tend to induce faster and more synchronously, and that promoter-proximal pausing is observed mainly in metazoans, in accord with a posited impact on synchrony. However, it has not been shown whether or not it is the change in the regulated step per se that is causal. We investigate this question by proposing and analyzing a continuous-time Markov chain model of PIC assembly regulated at one of two steps: initial polymerase association with DNA, or release from a paused, transcribing state. Our analysis demonstrates that, over a wide range of physical parameters, increased speed and synchrony are functional consequences of elongation control. Further, we make new predictions about the effect of elongation regulation on the consistent control of total transcript number between cells. We also identify which elements in the transcription induction pathway are most sensitive to molecular noise and thus possibly the most evolutionarily constrained. Our methods produce symbolic expressions for quantities of interest with reasonable computational effort and they can be used to explore the interplay between interaction topology and molecular noise in a broader class of biochemical networks. We provide general-purpose code implementing these methods
Seven-Pass Transmembrane Cadherins: Roles and Emerging Mechanisms in Axonal and Dendritic Patterning
The Flamingo/Celsr seven-transmembrane cadherins represent a conserved subgroup of the cadherin superfamily involved in multiple aspects of development. In the developing nervous system, Fmi/Celsr control axonal blueprint and dendritic morphogenesis from invertebrates to mammals. As expected from their molecular structure, seven-transmembrane cadherins can induce cellβcell homophilic interactions but also intracellular signaling. Fmi/Celsr is known to regulate planar cell polarity (PCP) through interactions with PCP proteins. In the nervous system, Fmi/Celsr can function in collaboration with or independently of other PCP genes. Here, we focus on recent studies which show that seven-transmembrane cadherins use distinct molecular mechanisms to achieve diverse functions in the development of the nervous system
Computational analyses of eukaryotic promoters
Computational analysis of eukaryotic promoters is one of the most difficult problems in computational genomics and is essential for understanding gene expression profiles and reverse-engineering gene regulation network circuits. Here I give a basic introduction of the problem and recent update on both experimental and computational approaches. More details may be found in the extended references. This review is based on a summer lecture given at Max Planck Institute at Berlin in 2005
Non-Coding RNA Prediction and Verification in Saccharomyces cerevisiae
Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5β² or 3β² untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements
Abnormal Dosage Compensation of Reporter Genes Driven by the Drosophila Glass Multiple Reporter (GMR) Enhancer-Promoter
In Drosophila melanogaster the male specific lethal (MSL) complex is required for upregulation of expression of most X-linked genes in males, thereby achieving X chromosome dosage compensation. The MSL complex is highly enriched across most active X-linked genes with a bias towards the 3β² end. Previous studies have shown that gene transcription facilitates MSL complex binding but the type of promoter did not appear to be important. We have made the surprising observation that genes driven by the glass multiple reporter (GMR) enhancer-promoter are not dosage compensated at X-linked sites. The GMR promoter is active in all cells in, and posterior to, the morphogenetic furrow of the developing eye disc. Using phiC31 integrase-mediated targeted integration, we measured expression of lacZ reporter genes driven by either the GMR or armadillo (arm) promoters at each of three X-linked sites. At all sites, the arm-lacZ reporter gene was dosage compensated but GMR-lacZ was not. We have investigated why GMR-driven genes are not dosage compensated. Earlier or constitutive expression of GMR-lacZ did not affect the level of compensation. Neither did proximity to a strong MSL binding site. However, replacement of the hsp70 minimal promoter with a minimal promoter from the X-linked 6-Phosphogluconate dehydrogenase gene did restore partial dosage compensation. Similarly, insertion of binding sites for the GAGA and DREF factors upstream of the GMR promoter led to significantly higher lacZ expression in males than females. GAGA and DREF have been implicated to play a role in dosage compensation. We conclude that the gene promoter can affect MSL complex-mediated upregulation and dosage compensation. Further, it appears that the nature of the basal promoter and the presence of binding sites for specific factors influence the ability of a gene promoter to respond to the MSL complex