18 research outputs found

    A Protein Classification Benchmark collection for machine learning

    Get PDF
    Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection () was created in order to provide standard datasets on which the performance of machine learning methods can be compared. It is primarily meant for method developers and users interested in comparing methods under standardized conditions. The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways. There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems. In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.). For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms

    RNA Helicase DDX1 Converts RNA G-Quadruplex Structures into R-Loops to Promote IgH Class Switch Recombination

    Get PDF
    Class switch recombination (CSR) at the immunoglobulin heavy-chain (IgH) locus is associated with the formation of R-loop structures over switch (S) regions. While these often occur co-transcriptionally between nascent RNA and template DNA, we now show that they also form as part of a post-transcriptional mechanism targeting AID to IgH S-regions. This depends on the RNA helicase DDX1 that is also required for CSR in vivo. DDX1 binds to G-quadruplex (G4) structures present in intronic switch transcripts and converts them into S-region R-loops. This in turn targets the cytidine deaminase enzyme AID to S-regions so promoting CSR. Notably R-loop levels over S-regions are diminished by chemical stabilization of G4 RNA or by the expression of a DDX1 ATPase-deficient mutant that acts as a dominant-negative protein to reduce CSR efficiency. In effect, we provide evidence for how S-region transcripts interconvert between G4 and R-loop structures to promote CSR in the IgH locus

    Cell-Cycle Modulation of Transcription Termination Factor Sen1

    Get PDF
    Many non-coding transcripts (ncRNA) generated by RNA polymerase II in S. cerevisiae are terminated by the Nrd1-Nab3-Sen1 complex. However, Sen1 helicase levels are surprisingly low compared with Nrd1 and Nab3, raising questions regarding how ncRNA can be terminated in an efficient and timely manner. We show that Sen1 levels increase during the S and G2 phases of the cell cycle, leading to increased termination activity of NNS. Overexpression of Sen1 or failure to modulate its abundance by ubiquitin-proteasome-mediated degradation greatly decreases cell fitness. Sen1 toxicity is suppressed by mutations in other termination factors, and NET-seq analysis shows that its overexpression leads to a decrease in ncRNA production and altered mRNA termination. We conclude that Sen1 levels are carefully regulated to prevent aberrant termination. We suggest that ncRNA levels and coding gene transcription termination are modulated by Sen1 to fulfill critical cell cycle-specific functions

    Mammalian NET-Seq reveals genome-wide nascent transcription coupled to RNA processing

    Get PDF
    © Copyright © 2015 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)Transcription is a highly dynamic process. Consequently, we have developed native elongating transcript sequencing technology for mammalian chromatin (mNET-seq), which generates single-nucleotide resolution, nascent transcription profiles. Nascent RNA was detected in the active site of RNA polymerase II (Pol II) along with associated RNA processing intermediates. In particular, we detected 5'splice site cleavage by the spliceosome, showing that cleaved upstream exon transcripts are associated with Pol II CTD phosphorylated on the serine 5 position (S5P), which is accumulated over downstream exons. Also, depletion of termination factors substantially reduces Pol II pausing at gene ends, leading to termination defects. Notably, termination factors play an additional promoter role by restricting non-productive RNA synthesis in a Pol II CTD S2P-specific manner. Our results suggest that CTD phosphorylation patterns established for yeast transcription are significantly different in mammals. Taken together, mNET-seq provides dynamic and detailed snapshots of the complex events underlying transcription in mammals.This work was supported by funding to N.J.P. (Wellcome Trust Programme [091805/Z/10/Z] and ERC Advanced [339270] Grants) and to M.C.-F. (Fundação Ciência e Tecnologia, Portugal).info:eu-repo/semantics/publishedVersio

    Deregulated Expression of Mammalian lncRNA through Loss of SPT6 Induces R-Loop Formation, Replication Stress, and Cellular Senescence.

    Get PDF
    Extensive tracts of the mammalian genome that lack protein-coding function are still transcribed into long noncoding RNA. While these lncRNAs are generally short lived, length restricted, and non-polyadenylated, how their expression is distinguished from protein-coding genes remains enigmatic. Surprisingly, depletion of the ubiquitous Pol-II-associated transcription elongation factor SPT6 promotes a redistribution of H3K36me3 histone marks from active protein coding to lncRNA genes, which correlates with increased lncRNA transcription. SPT6 knockdown also impairs the recruitment of the Integrator complex to chromatin, which results in a transcriptional termination defect for lncRNA genes. This leads to the formation of extended, polyadenylated lncRNAs that are both chromatin restricted and form increased levels of RNA:DNA hybrid (R-loops) that are associated with DNA damage. Additionally, these deregulated lncRNAs overlap with DNA replication origins leading to localized DNA replication stress and a cellular senescence phenotype. Overall, our results underline the importance of restricting lncRNA expression

    Nuclear fate of yeast snoRNA is determined by co-transcriptional Rnt1 cleavage

    Get PDF
    Small nucleolar RNA (snoRNA) are conserved and essential non-coding RNA that are transcribed by RNA Polymerase II (Pol II). Two snoRNA classes, formerly distinguished by their structure and ribonucleoprotein composition, act as guide RNA to target RNA such as ribosomal RNA, and thereby introduce specific modifications. We have studied the 5'end processing of individually transcribed snoRNA in S. cerevisiae to define their role in snoRNA biogenesis and functionality. Here we show that pre-snoRNA processing by the endonuclease Rnt1 occurs co-transcriptionally with removal of the m7G cap facilitating the formation of box C/D snoRNA. Failure of this process causes aberrant 3'end processing and mislocalization of snoRNA to the cytoplasm. Consequently, Rnt1-dependent 5'end processing of box C/D snoRNA is critical for snoRNA-dependent methylation of ribosomal RNA. Our results reveal that the 5'end processing of box C/D snoRNA defines their distinct pathway of maturation

    Microprocessor mediates transcriptional termination of long noncoding RNA transcripts hosting microRNAs

    Get PDF
    MicroRNA (miRNA) play a major role in the post-transcriptional regulation of gene expression. Mammalian miRNA biogenesis begins with co-transcriptional cleavage of RNA polymerase II (Pol II) transcripts by the Microprocessor complex. While most miRNA are located within introns of protein coding genes, a substantial minority of miRNA originate from long non coding (lnc) RNA where transcript processing is largely uncharacterized. Here, by detailed characterization of liver-specific lnc-pri-miR-122 and genome-wide analysis, we show that most lnc-pri-miRNA do not use the canonical cleavage and polyadenylation (CPA) pathway but instead use Microprocessor cleavage to terminate transcription. Microprocessor inactivation leads to extensive transcriptional readthrough of lnc-pri-miRNA and transcriptional interference with downstream genes. Consequently we define a novel RNase III-mediated, polyadenylation-independent mechanism of Pol II transcription termination in mammalian cells
    corecore