4 research outputs found

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    New group of transmembrane proteins associated with desiccation tolerance in the anhydrobiotic midge Polypedilum vanderplanki

    No full text
    © 2020, The Author(s). Larvae of the sleeping chironomid Polypedilum vanderplanki are known for their extraordinary ability to survive complete desiccation in an ametabolic state called “anhydrobiosis”. The unique feature of P. vanderplanki genome is the presence of expanded gene clusters associated with anhydrobiosis. While several such clusters represent orthologues of known genes, there is a distinct set of genes unique for P. vanderplanki. These include Lea-Island-Located (LIL) genes with no known orthologues except two of LEA genes of P. vanderplanki, PvLea1 and PvLea3. However, PvLIL proteins lack typical features of LEA such as the state of intrinsic disorder, hydrophilicity and characteristic LEA_4 motif. They possess four to five transmembrane domains each and we confirmed membrane targeting for three PvLILs. Conserved amino acids in PvLIL are located in transmembrane domains or nearby. PvLEA1 and PvLEA3 proteins are chimeras combining LEA-like parts and transmembrane domains, shared with PvLIL proteins. We have found that PvLil genes are highly upregulated during anhydrobiosis induction both in larvae of P. vanderplanki and P. vanderplanki-derived cultured cell line, Pv11. Thus, PvLil are a new intriguing group of genes that are likely to be associated with anhydrobiosis due to their common origin with some LEA genes and their induction during anhydrobiosis

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    No full text
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    No full text
    10.1038/s41467-021-23143-7Nature Communications121329
    corecore