157 research outputs found

    Cryptic Variation in the Human Mutation Rate

    Get PDF
    The mutation rate is known to vary between adjacent sites within the human genome as a consequence of context, the most well-studied example being the influence of CpG dinucelotides. We investigated whether there is additional variation by testing whether there is an excess of sites at which both humans and chimpanzees have a single-nucleotide polymorphism ( SNP). We found a highly significant excess of such sites, and we demonstrated that this excess is not due to neighbouring nucleotide effects, ancestral polymorphism, or natural selection. We therefore infer that there is cryptic variation in the mutation rate. However, although this variation in the mutation rate is not associated with the adjacent nucleotides, we show that there are highly nonrandom patterns of nucleotides that extend similar to 80 base pairs on either side of sites with coincident SNPs, suggesting that there are extensive and complex context effects. Finally, we estimate the level of variation needed to produce the excess of coincident SNPs and show that there is a similar, or higher, level of variation in the mutation rate associated with this cryptic process than there is associated with adjacent nucleotides, including the CpG effect. We conclude that there is substantial variation in the mutation that has, until now, been hidden from view

    Patterns of mutation in the human genome

    Get PDF
    The processes that underlie point mutations in the human genome are largely unknown. However, the cumulative effect of these processes have a large impact on how mutation rates vary across a number of different scales and contexts, and consequently guide our understanding of human disease and evolution. Although variation in the mutation rate has been characterized on many different levels, it is not fully understood the extent to which the rate of mutation can vary outside of the general patterns already observed. Beginning with the human genome project, many studies have produced large unbiased sequence datasets within a number of human populations. To that end, we analysed a number of sequence datasets in an attempt to better understand the patterns and causes of variation in the rate of mutation that exists across the genome. Firstly, we find that the mutation rates of single sites vary by more than is currently understood, and that this variation is not associated with any specific process or feature on either a local or genomic scale. Although we have been unable to uncover the source of such variation, understanding the range of mutability at sites in the human genome is important since it may point to functional regions, disease phenotypes and prompt further ideas on the underlying mechanisms associated with such a result. Furthermore, we find evidence that a mutational process that can generate the simultaneous production of two new alleles within the same individual during a single, or tightly linked series of mutation events increases the number of tri-allelic sites in the human genome. There are a number of potential mechanisms that may drive this process, and the consequences of such an event may be far reaching, as the generation of two new alleles at a single site in functional regions may allow a more rapid exploration of evolutionary space. Furthermore, this process appears to make a reasonable contribution to variation in the human genome, thus providing a substrate for evolutionary change. Finally, we observe significant variation in the mutation rate over all scales in cancer genomes. Part of this result can be explained by the actions of specific carcinogens, however it is striking that patterns of mutation can be both consistent across different cancer types, but also very different between individuals with the same type of cancer over different scales. This result points to the idea that the patterns of mutation may vary widely between different genomes under different conditions, and the identification of general patterns in a small number of samples may not fully describe the extent to which mutation rates can vary. Taken together, these conclusions suggest that the patterns and processes underlying mutation are highly complex, and require further analysis if they are to be fully understood

    Highly accurate quantification of allelic gene expression for population and disease genetics

    Get PDF
    Publisher Copyright: © 2022 Saukkonen et al.Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∌80% and increasing the number of sites that can be reliably quantified by ∌3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type–specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes.Peer reviewe

    ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R

    Full text link
    We present ensemblQueryR, a package providing an R interface to the Ensembl REST API that facilitates flexible, fast, user-friendly and R workflow integrable querying of Ensembl REST API linkage disequilibrium (LD) endpoints, optimised for high-throughput querying. ensemblQueryR achieves this through functions that are intuitive and amenable to custom code integration, use of familiar R object types as inputs and outputs, code optimisation and optional parallelisation functionality. For each LD endpoint, ensemblQueryR provides two functions, permitting both single-query and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate that ensemblQueryR has improved performance in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase over analogous software whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through availability of Docker and singularity images, making this tool widely accessible to the scientific community

    ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R

    Get PDF
    We present ensemblQueryR, an R package for querying Ensembl linkage disequilibrium (LD) endpoints. This package is flexible, fast and user-friendly, and optimised for high-throughput querying. ensemblQueryR uses functions that are intuitive and amenable to custom code integration, familiar R object types as inputs and outputs as well as providing parallelisation functionality. For each Ensembl LD endpoint, ensemblQueryR provides two functions, permitting both single- and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate improved computational performance of ensemblQueryR over an exisiting tool in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through Docker and singularity images, making this tool widely accessible to the scientific community

    Vacuum spacetimes of embedding class two

    Get PDF
    Doubt is cast on the much quoted results of Yakupov that the torsion vector in embedding class two vacuum space-times is necessarily a gradient vector and that class 2 vacua of Petrov type III do not exist. The rst result is equivalent to the fact that the two second fundamental forms associated with the embedding necessarily commute and has been assumed in most later investigations of class 2 vacuum space-times. Yakupov stated the result without proof, but hinted that it followed purely algebraically from his identity: Rijkl Ckl = 0 where Cij is the commutator of the two second fundamental forms of the embedding.From Yakupov's identity, it is shown that the only class two vacua with non-zero commutator Cij must necessarily be of Petrov type III or N. Several examples are presented of non-commuting second fundamental forms that satisfy Yakupovs identity and the vacuum condition following from the Gauss equation; both Petrov type N and type III examples occur. Thus it appears unlikely that his results could follow purely algebraically. The results obtained so far do not constitute denite counter-examples to Yakupov's results as the non-commuting examples could turn out to be incompatible with the Codazzi and Ricci embedding equations. This question is currently being investigated

    The Genomic Distribution and Local Context of Coincident SNPs in Human and Chimpanzee

    Get PDF
    We have previously shown that there is an excess of sites that are polymorphic at orthologous positions in humans and chimpanzees and that this is most likely due to cryptic variation in the mutation rate. We showed that this might be a consequence of complex context effects since we found significant heterogeneity in triplet frequencies around coincident single nucleotide polymorphism (SNP) sites. Here, we show that the heterogeneity in triplet frequencies is not specifically associated with coincident SNPs but is instead driven by base composition bias around CpG dinucleotides. As a result, we suggest that cryptic variation in the mutation rate is truly cryptic, in the sense that the mutation rate does not appear to depend on any specific primary sequence context. Furthermore, we propose that the patterns around CpG dinucleotides are driven by the mutability of CpG dinucleotides in different DNA contexts. We also show that the genomic distribution of coincident SNPs is nonuniform and that there are some subtle differences between the distributions of single and coincident SNPs. Furthermore, we identify regions that contain high numbers of coincident SNPs and suggest that one in particular, a region containing the gene PRIM2, may be under balancing selection

    Dark matter searches at LHC

    Full text link
    Besides Standard Model measurements and other Beyond Standard Model studies, the ATLAS and CMS experiments at the LHC will search for Supersymmetry, one of the most attractive explanation for dark matter. The SUSY discovery potential with early data is presented here together with some first results obtained with 2010 collision data at 7 TeV. Emphasis is placed on measurements and parameter determination that can be performed to disentangle the possible SUSY models and SUSY look-alike and the interpretation of a possible positive supersymmetric signal as an explanation of dark matter.Comment: 15 pages, 14 figures, Invited plenary talk given at DISCRETE 2010: Symposium On Prospects In The Physics Of Discrete Symmetries, 6-11 Dec 2010, Rome, Ital

    Genetic variation at mouse and human ribosomal DNA influences associated epigenetic states

    Get PDF
    Background: Ribosomal DNA (rDNA) displays substantial inter-individual genetic variation in human and mouse. A systematic analysis of how this variation impacts epigenetic states and expression of the rDNA has thus far not been performed. Results: Using a combination of long- and short-read sequencing, we establish that 45S rDNA units in the C57BL/6J mouse strain exist as distinct genetic haplotypes that influence the epigenetic state and transcriptional output of any given unit. DNA methylation dynamics at these haplotypes are dichotomous and life-stage specific: at one haplotype, the DNA methylation state is sensitive to the in utero environment, but refractory to post-weaning influences, whereas other haplotypes entropically gain DNA methylation during aging only. On the other hand, individual rDNA units in human show limited evidence of genetic haplotypes, and hence little discernible correlation between genetic and epigenetic states. However, in both species, adjacent units show similar epigenetic profiles, and the overall epigenetic state at rDNA is strongly positively correlated with the total rDNA copy number. Analysis of different mouse inbred strains reveals that in some strains, such as 129S1/SvImJ, the rDNA copy number is only approximately 150 copies per diploid genome and DNA methylation levels are < 5%. Conclusions: Our work demonstrates that rDNA-associated genetic variation has a considerable influence on rDNA epigenetic state and consequently rRNA expression outcomes. In the future, it will be important to consider the impact of inter-individual rDNA (epi)genetic variation on mammalian phenotypes and diseases

    Recent advances in candidate-gene and whole-genome approaches to the discovery of anthelmintic resistance markers and the description of drug/receptor interactions

    Get PDF
    Anthelmintic resistance has a great impact on livestock production systems worldwide, is an emerging concern in companion animal medicine, and represents a threat to our ongoing ability to control human soil-transmitted helminths. The Consortium for Anthelmintic Resistance and Susceptibility (CARS) provides a forum for scientists to meet and discuss the latest developments in the search for molecular markers of anthelmintic resistance. Such markers are important for detecting drug resistant worm populations, and indicating the likely impact of the resistance on drug efficacy. The molecular basis of resistance is also important for understanding how anthelmintics work, and how drug resistant populations arise. Changes to target receptors, drug efflux and other biological processes can be involved. This paper reports on the CARS group meeting held in August 2013 in Perth, Australia. The latest knowledge on the development of molecular markers for resistance to each of the principal classes of anthelmintics is reviewed. The molecular basis of resistance is best understood for the benzimidazole group of compounds, and we examine recent work to translate this knowledge into useful diagnostics for field use. We examine recent candidate-gene and whole-genome approaches to understanding anthelmintic resistance and identify markers. We also look at drug transporters in terms of providing both useful markers for resistance, as well as opportunities to overcome resistance through the targeting of the transporters themselves with inhibitors. Finally, we describe the tools available for the application of the newest high-throughput sequencing technologies to the study of anthelmintic resistance
    • 

    corecore