18 research outputs found

    Long genes and genes with multiple splice variants are enriched in pathways linked to cancer and other multigenic diseases.

    Get PDF
    BACKGROUND: The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases. RESULTS: We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations. CONCLUSIONS: Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.This research was supported by the Cancer Research UK and the Herchel Smith Fund. SB is a Wellcome Trust Senior Investigator

    Single genome retrieval of context-dependent variability in mutation rates for human germline

    Get PDF
    Abstract Background Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. Results The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. Conclusions The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes

    Generalised interrelations among mutation rates drive the genomic compliance of Chargaff's second parity rule

    Get PDF
    Chargaff's second parity rule (PR-2), where the complementary base and k-mer contents are matching within the same strand of a double stranded DNA (dsDNA), is a phenomenon that invited many explanations. The strict compliance of nearly all nuclear dsDNA to PR-2 implies that the explanation should also be similarly adamant. In this work, we revisited the possibility of mutation rates driving PR-2 compliance. Starting from the assumption-free approach, we constructed kinetic equations for unconstrained simulations. The results were analysed for their PR-2 compliance by employing symbolic regression and machine learning techniques. We arrived to a generalised set of mutation rate interrelations in place in most species that allow for their full PR-2 compliance. Importantly, our constraints explain PR-2 in genomes out of the scope of the prior explanations based on the equilibration under mutation rates with simpler no-strand-bias constraints. We thus reinstate the role of mutation rates in PR-2 through its molecular core, now shown, under our formulation, to be tolerant to previously noted strand biases and incomplete compositional equilibration. We further investigate the time for any genome to reach PR-2, showing that it is generally earlier than the compositional equilibrium, and well within the age of life on Earth

    Structural Analysis using SHALiPE to Reveal RNA G-Quadruplex Formation in Human Precursor MicroRNA

    Get PDF
    RNA G-quadruplex (rG4) structures are of fundamental importance to biology. A novel approach is introduced to detect and structurally map rG4s at single-nucleotide resolution in RNAs. The approach, denoted SHALiPE, couples selective 2'-hydroxyl acylation with lithium ion-based primer extension, and identifies characteristic structural fingerprints for rG4 mapping. We apply SHALiPE to interrogate the human precursor microRNA 149, and reveal the formation of an rG4 structure in this non-coding RNA. Additional analyses support the SHALiPE results and uncover that this rG4 has a parallel topology, is thermally stable, and is conserved in mammals. An in vitro Dicer assay shows that this rG4 inhibits Dicer processing, supporting the potential role of rG4 structures in microRNA maturation and post-transcriptional regulation of mRNAs.This is the accepted manuscript. The final version is available at http://dx.doi.org/10.1002/anie.201603562

    Selective Chemical Labeling of Natural T Modifications in DNA.

    Get PDF
    We present a chemical method to selectively tag and enrich thymine modifications, 5-formyluracil (5-fU) and 5-hydroxymethyluracil (5-hmU), found naturally in DNA. Inherent reactivity differences have enabled us to tag 5-fU chemoselectively over its C modification counterpart, 5-formylcytosine (5-fC). We rationalized the enhanced reactivity of 5-fU compared to 5-fC via ab initio quantum mechanical calculations. We exploited this chemical tagging reaction to provide proof of concept for the enrichment of 5-fU containing DNA from a pool that contains 5-fC or no modification. We further demonstrate that 5-hmU can be chemically oxidized to 5-fU, providing a strategy for the enrichment of 5-hmU. These methods will enable the mapping of 5-fU and 5-hmU in genomic DNA, to provide insights into their functional role and dynamics in biology.R.E.H. is supported by The University of Cambridge, F.K. is supported by the Wellcome Trust, and A.B.S. is supported by the Herchel Smith Fund. The Balasubramanian group is core- funded by a Wellcome Trust Senior Investigator Award and by Cancer Research UK. Departmental NMR facilities are supported by EPSRC grant EP/K039520/1.This is the final version. It was first published by ACS at http://pubs.acs.org/doi/abs/10.1021/jacs.5b03730

    Whole genome experimental maps of DNA G-quadruplexes in multiple species.

    Get PDF
    Genomic maps of DNA G-quadruplexes (G4s) can help elucidate the roles that these secondary structures play in various organisms. Herein, we employ an improved version of a G-quadruplex sequencing method (G4-seq) to generate whole genome G4 maps for 12 species that include widely studied model organisms and also pathogens of clinical relevance. We identify G4 structures that form under physiological K+ conditions and also G4s that are stabilized by the G4-targeting small molecule pyridostatin (PDS). We discuss the various structural features of the experimentally observed G-quadruplexes (OQs), highlighting differences in their prevalence and enrichment across species. Our study describes diversity in sequence composition and genomic location for the OQs in the different species and reveals that the enrichment of OQs in gene promoters is particular to mammals such as mouse and human, among the species studied. The multi-species maps have been made publicly available as a resource to the research community. The maps can serve as blueprints for biological experiments in those model organisms, where G4 structures may play a role.The S.B. research group is supported by programme grant funding from Cancer Research UK (C9681/A18618), European Research Council Advanced Grant No. 339778, a Wellcome Trust Senior Investigator Award (grant 209441/z/17/z) and by core funding from Cancer Research UK (C14303/A17197). We are grateful to the Biotechnology and Biological Sciences Research Council (BBSRC) and Illumina for the CASE studentship supporting V.S.C. (BB/I015477/1)

    Machine learning model for sequence-driven DNA G-quadruplex formation.

    Get PDF
    We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities

    G-quadruplex structures within the 3' UTR of LINE-1 elements stimulate retrotransposition

    Get PDF
    Long interspersed nuclear elements (LINEs) are ubiquitous transposable elements in higher eukaryotes that have a significant role in shaping genomes, owing to their abundance. Here we report that guanine-rich sequences in the 3' untranslated regions (UTRs) of hominoid-specific LINE-1 elements are coupled with retrotransposon speciation and contribute to retrotransposition through the formation of G-quadruplex (G4) structures. We demonstrate that stabilization of the G4 motif of a human-specific LINE-1 element by small-molecule ligands stimulates retrotransposition.S.B. is a Wellcome Trust Senior Investigator (grant 099232/z/12/z). The Balasubramanian group is supported by European Research Council Advanced Grant 339778, and receives core (C14303/A17197) and program (C9681/A18618) funding from Cancer Research UK
    corecore