37 research outputs found

    Quantitative model of R-loop forming structures reveals a novel level of RNA–DNA interactome complexity

    Get PDF
    R-loop is the structure co-transcriptionally formed between nascent RNA transcript and DNA template, leaving the non-transcribed DNA strand unpaired. This structure can be involved in the hyper-mutation and dsDNA breaks in mammalian immunoglobulin (Ig) genes, oncogenes and neurodegenerative disease related genes. R-loops have not been studied at the genome scale yet. To identify the R-loops, we developed a computational algorithm and mapped R-loop forming sequences (RLFS) onto 66 803 sequences defined by UCSC as ‘known’ genes. We found that ∼59% of these transcribed sequences contain at least one RLFS. We created R-loopDB (http://rloop.bii.a-star.edu.sg/), the database that collects all RLFS identified within over half of the human genes and links to the UCSC Genome Browser for information integration and visualisation across a variety of bioinformatics sources. We found that many oncogenes and tumour suppressors (e.g. Tp53, BRCA1, BRCA2, Kras and Ptprd) and neurodegenerative diseases related genes (e.g. ATM, Park2, Ptprd and GLDC) could be prone to significant R-loop formation. Our findings suggest that R-loops provide a novel level of RNA–DNA interactome complexity, playing key roles in gene expression controls, mutagenesis, recombination process, chromosomal rearrangement, alternative splicing, DNA-editing and epigenetic modifications. RLFSs could be used as a novel source of prospective therapeutic targets

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    How to discriminate between potentially novel and considered biomarkers within molecular signature?

    No full text
    10.1109/CIBCB.2013.6595405Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013176-18

    Dynamic transcriptomic m 5

    No full text

    Two SARS-CoV-2 Genome Sequences of Isolates from Rural U.S. Patients Harboring the D614G Mutation, Obtained Using Nanopore Sequencing.

    No full text
    Two coding-complete sequences of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were obtained from samples from two patients in Arkansas, in the southeastern corner of the United States. The viral genome was obtained using the ARTIC Network protocol and Oxford Nanopore Technologies sequencing

    Ecotopic viral integration site 1 (EVI1) regulates multiple cellular processes important for cancer and is a synergistic partner for FOS protein in invasive tumors

    No full text
    Ecotropic viral integration site 1 (EVI1) is an oncogenic dual domain zinc finger transcription factor that plays an essential role in the regulation of hematopoietic stem cell renewal, and its overexpression in myeloid leukemia and epithelial cancers is associated with poor patient survival. Despite the discovery of EVI1 in 1988 and its emerging role as a dominant oncogene in various types of cancer, few EVI1 target genes are known. This lack of knowledge has precluded a clear understanding of exactly how EVI1 contributes to cancer. Using a combination of ChIP-Seq and microarray studies in human ovarian carcinoma cells, we show that the two zinc finger domains of EVI1 bind to DNA independently and regulate different sets of target genes. Strikingly, an enriched fraction of EVI1 target genes are cancer genes or genes associated with cancer. We also show that more than 25% of EVI1-occupied genes contain linked EVI1 and activator protein (AP)1 DNA binding sites, and this finding provides evidence for a synergistic cooperative interaction between EVI1 and the AP1 family member FOS in the regulation of cell adhesion, proliferation, and colony formation. An increased number of dual EVI1/AP1 target genes are also differentially regulated in late-stage ovarian carcinomas, further confirming the importance of the functional cooperation between EVI1 and FOS. Collectively, our data indicate that EVI1 is a multipurpose transcription factor that synergizes with FOS in invasive tumors.Emilie A. Bard-Chapeau, Justin Jeyakani, Chung H. Kok, Julius Muller, Belinda Q. Chua, Jayantha Gunaratne, Arsen Batagov, Piroon Jenjaroenpun, Vladimir A. Kuznetsov, Chia-Lin Wei, Richard J. D'Andrea, Guillaume Bourque, Nancy A. Jenkins, and Neal G. Copelan
    corecore