37 research outputs found
Quantitative model of R-loop forming structures reveals a novel level of RNA–DNA interactome complexity
R-loop is the structure co-transcriptionally formed between nascent RNA transcript and DNA template, leaving the non-transcribed DNA strand unpaired. This structure can be involved in the hyper-mutation and dsDNA breaks in mammalian immunoglobulin (Ig) genes, oncogenes and neurodegenerative disease related genes. R-loops have not been studied at the genome scale yet. To identify the R-loops, we developed a computational algorithm and mapped R-loop forming sequences (RLFS) onto 66 803 sequences defined by UCSC as ‘known’ genes. We found that ∼59% of these transcribed sequences contain at least one RLFS. We created R-loopDB (http://rloop.bii.a-star.edu.sg/), the database that collects all RLFS identified within over half of the human genes and links to the UCSC Genome Browser for information integration and visualisation across a variety of bioinformatics sources. We found that many oncogenes and tumour suppressors (e.g. Tp53, BRCA1, BRCA2, Kras and Ptprd) and neurodegenerative diseases related genes (e.g. ATM, Park2, Ptprd and GLDC) could be prone to significant R-loop formation. Our findings suggest that R-loops provide a novel level of RNA–DNA interactome complexity, playing key roles in gene expression controls, mutagenesis, recombination process, chromosomal rearrangement, alternative splicing, DNA-editing and epigenetic modifications. RLFSs could be used as a novel source of prospective therapeutic targets
Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism
How to discriminate between potentially novel and considered biomarkers within molecular signature?
10.1109/CIBCB.2013.6595405Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013176-18
Recommended from our members
Activation-induced cytidine deaminase localizes to G-quadruplex motifs at mutation hotspots in lymphoma
Diffuse large B-cell lymphoma (DLBCL) is a molecularly heterogeneous group of malignancies with frequent genetic abnormalities. G-quadruplex (G4) DNA structures may facilitate this genomic instability through association with activation-induced cytidine deaminase (AID), an antibody diversification enzyme implicated in mutation of oncogenes in B-cell lymphomas. Chromatin immunoprecipitation sequencing analyses in this study revealed that AID hotspots in both activated B cells and lymphoma cells in vitro were highly enriched for G4 elements. A representative set of these targeted sequences was validated for characteristic, stable G4 structure formation including previously unknown G4s in lymphoma-associated genes,CBFA2T3, SPIB, BCL6, HLA-DRB5 and MEF2C, along with the established BCL2 and MYC structures. Frequent genomewide G4 formation was also detected for the first time in DLBCL patient-derived tissues using BG4, a structure-specific G4 antibody. Tumors with greater staining were more likely to have concurrent BCL2 and MYC oncogene amplification and BCL2 mutations. Ninety-seven percent of the BCL2 mutations occurred within G4 sites that overlapped with AID binding. G4 localization at sites of mutation, and within aggressive DLBCL tumors harboring amplified BCL2 and MYC, supports a role for G4 structures in events that lead to a loss of genomic integrity, a critical step in B-cell lymphomagenesis. © The Author(s) 2020.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Two SARS-CoV-2 Genome Sequences of Isolates from Rural U.S. Patients Harboring the D614G Mutation, Obtained Using Nanopore Sequencing.
Two coding-complete sequences of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were obtained from samples from two patients in Arkansas, in the southeastern corner of the United States. The viral genome was obtained using the ARTIC Network protocol and Oxford Nanopore Technologies sequencing
Contrasting expression patterns of coding and noncoding parts of the human genome upon oxidative stress
10.1038/srep09737Scientific Reports5973
Ecotopic viral integration site 1 (EVI1) regulates multiple cellular processes important for cancer and is a synergistic partner for FOS protein in invasive tumors
Ecotropic viral integration site 1 (EVI1) is an oncogenic dual domain zinc finger transcription factor that plays an essential role in the regulation of hematopoietic stem cell renewal, and its overexpression in myeloid leukemia and epithelial cancers is associated with poor patient survival. Despite the discovery of EVI1 in 1988 and its emerging role as a dominant oncogene in various types of cancer, few EVI1 target genes are known. This lack of knowledge has precluded a clear understanding of exactly how EVI1 contributes to cancer. Using a combination of ChIP-Seq and microarray studies in human ovarian carcinoma cells, we show that the two zinc finger domains of EVI1 bind to DNA independently and regulate different sets of target genes. Strikingly, an enriched fraction of EVI1 target genes are cancer genes or genes associated with cancer. We also show that more than 25% of EVI1-occupied genes contain linked EVI1 and activator protein (AP)1 DNA binding sites, and this finding provides evidence for a synergistic cooperative interaction between EVI1 and the AP1 family member FOS in the regulation of cell adhesion, proliferation, and colony formation. An increased number of dual EVI1/AP1 target genes are also differentially regulated in late-stage ovarian carcinomas, further confirming the importance of the functional cooperation between EVI1 and FOS. Collectively, our data indicate that EVI1 is a multipurpose transcription factor that synergizes with FOS in invasive tumors.Emilie A. Bard-Chapeau, Justin Jeyakani, Chung H. Kok, Julius Muller, Belinda Q. Chua, Jayantha Gunaratne, Arsen Batagov, Piroon Jenjaroenpun, Vladimir A. Kuznetsov, Chia-Lin Wei, Richard J. D'Andrea, Guillaume Bourque, Nancy A. Jenkins, and Neal G. Copelan