157 research outputs found
Cryptic Variation in the Human Mutation Rate
The mutation rate is known to vary between adjacent sites within the human genome as a consequence of context, the most well-studied example being the influence of CpG dinucelotides. We investigated whether there is additional variation by testing whether there is an excess of sites at which both humans and chimpanzees have a single-nucleotide polymorphism ( SNP). We found a highly significant excess of such sites, and we demonstrated that this excess is not due to neighbouring nucleotide effects, ancestral polymorphism, or natural selection. We therefore infer that there is cryptic variation in the mutation rate. However, although this variation in the mutation rate is not associated with the adjacent nucleotides, we show that there are highly nonrandom patterns of nucleotides that extend similar to 80 base pairs on either side of sites with coincident SNPs, suggesting that there are extensive and complex context effects. Finally, we estimate the level of variation needed to produce the excess of coincident SNPs and show that there is a similar, or higher, level of variation in the mutation rate associated with this cryptic process than there is associated with adjacent nucleotides, including the CpG effect. We conclude that there is substantial variation in the mutation that has, until now, been hidden from view
Patterns of mutation in the human genome
The processes that underlie point mutations in the human genome are largely unknown.
However, the cumulative effect of these processes have a large impact on how mutation
rates vary across a number of different scales and contexts, and consequently guide our
understanding of human disease and evolution. Although variation in the mutation rate
has been characterized on many different levels, it is not fully understood the extent to
which the rate of mutation can vary outside of the general patterns already observed.
Beginning with the human genome project, many studies have produced large unbiased
sequence datasets within a number of human populations. To that end, we analysed a
number of sequence datasets in an attempt to better understand the patterns and causes
of variation in the rate of mutation that exists across the genome. Firstly, we find that
the mutation rates of single sites vary by more than is currently understood, and that this
variation is not associated with any specific process or feature on either a local or
genomic scale. Although we have been unable to uncover the source of such variation,
understanding the range of mutability at sites in the human genome is important since it
may point to functional regions, disease phenotypes and prompt further ideas on the
underlying mechanisms associated with such a result. Furthermore, we find evidence
that a mutational process that can generate the simultaneous production of two new
alleles within the same individual during a single, or tightly linked series of mutation
events increases the number of tri-allelic sites in the human genome. There are a
number of potential mechanisms that may drive this process, and the consequences of
such an event may be far reaching, as the generation of two new alleles at a single site
in functional regions may allow a more rapid exploration of evolutionary space.
Furthermore, this process appears to make a reasonable contribution to variation in the
human genome, thus providing a substrate for evolutionary change. Finally, we observe
significant variation in the mutation rate over all scales in cancer genomes. Part of this result can be explained by the actions of specific carcinogens, however it is striking that
patterns of mutation can be both consistent across different cancer types, but also very
different between individuals with the same type of cancer over different scales. This
result points to the idea that the patterns of mutation may vary widely between different
genomes under different conditions, and the identification of general patterns in a small
number of samples may not fully describe the extent to which mutation rates can vary.
Taken together, these conclusions suggest that the patterns and processes underlying
mutation are highly complex, and require further analysis if they are to be fully
understood
Highly accurate quantification of allelic gene expression for population and disease genetics
Publisher Copyright: © 2022 Saukkonen et al.Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by âŒ80% and increasing the number of sites that can be reliably quantified by âŒ3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell typeâspecific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes.Peer reviewe
ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R
We present ensemblQueryR, a package providing an R interface to the Ensembl
REST API that facilitates flexible, fast, user-friendly and R workflow
integrable querying of Ensembl REST API linkage disequilibrium (LD) endpoints,
optimised for high-throughput querying. ensemblQueryR achieves this through
functions that are intuitive and amenable to custom code integration, use of
familiar R object types as inputs and outputs, code optimisation and optional
parallelisation functionality. For each LD endpoint, ensemblQueryR provides two
functions, permitting both single-query and multi-query modes of operation. The
multi-query functions are optimised for large query sizes and provide optional
parallelisation to leverage available computational resources and minimise
processing time. We demonstrate that ensemblQueryR has improved performance in
terms of random access memory (RAM) usage and speed, delivering a 10-fold speed
increase over analogous software whilst using a third of the RAM. Finally,
ensemblQueryR is near-agnostic to operating system and computational
architecture through availability of Docker and singularity images, making this
tool widely accessible to the scientific community
ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R
We present ensemblQueryR, an R package for querying Ensembl linkage disequilibrium (LD) endpoints. This package is flexible, fast and user-friendly, and optimised for high-throughput querying. ensemblQueryR uses functions that are intuitive and amenable to custom code integration, familiar R object types as inputs and outputs as well as providing parallelisation functionality. For each Ensembl LD endpoint, ensemblQueryR provides two functions, permitting both single- and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate improved computational performance of ensemblQueryR over an exisiting tool in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through Docker and singularity images, making this tool widely accessible to the scientific community
Vacuum spacetimes of embedding class two
Doubt is cast on the much quoted results of Yakupov that the torsion vector in embedding class two vacuum space-times is necessarily a gradient vector and that class 2 vacua of Petrov type III do not exist. The rst result is equivalent to the fact that the two second fundamental forms associated with the embedding necessarily commute and has been assumed in most later investigations of class 2 vacuum space-times. Yakupov stated the result without proof, but hinted that it followed purely algebraically from his identity: Rijkl Ckl = 0 where Cij is the commutator of the two second fundamental forms of the embedding.From Yakupov's identity, it is shown that the only class two vacua with non-zero commutator Cij must necessarily be of Petrov type III or N. Several examples are presented of non-commuting second fundamental forms that satisfy Yakupovs identity and the vacuum condition following from the Gauss equation; both Petrov type N and type III examples occur. Thus it appears unlikely that his results could follow purely algebraically. The results obtained so far do not constitute denite counter-examples to Yakupov's results as the non-commuting examples could turn out to be incompatible with the Codazzi and Ricci embedding equations. This question is currently being investigated
The Genomic Distribution and Local Context of Coincident SNPs in Human and Chimpanzee
We have previously shown that there is an excess of sites that are polymorphic at orthologous positions in humans and chimpanzees and that this is most likely due to cryptic variation in the mutation rate. We showed that this might be a consequence of complex context effects since we found significant heterogeneity in triplet frequencies around coincident single nucleotide polymorphism (SNP) sites. Here, we show that the heterogeneity in triplet frequencies is not specifically associated with coincident SNPs but is instead driven by base composition bias around CpG dinucleotides. As a result, we suggest that cryptic variation in the mutation rate is truly cryptic, in the sense that the mutation rate does not appear to depend on any specific primary sequence context. Furthermore, we propose that the patterns around CpG dinucleotides are driven by the mutability of CpG dinucleotides in different DNA contexts. We also show that the genomic distribution of coincident SNPs is nonuniform and that there are some subtle differences between the distributions of single and coincident SNPs. Furthermore, we identify regions that contain high numbers of coincident SNPs and suggest that one in particular, a region containing the gene PRIM2, may be under balancing selection
Dark matter searches at LHC
Besides Standard Model measurements and other Beyond Standard Model studies,
the ATLAS and CMS experiments at the LHC will search for Supersymmetry, one of
the most attractive explanation for dark matter. The SUSY discovery potential
with early data is presented here together with some first results obtained
with 2010 collision data at 7 TeV. Emphasis is placed on measurements and
parameter determination that can be performed to disentangle the possible SUSY
models and SUSY look-alike and the interpretation of a possible positive
supersymmetric signal as an explanation of dark matter.Comment: 15 pages, 14 figures, Invited plenary talk given at DISCRETE 2010:
Symposium On Prospects In The Physics Of Discrete Symmetries, 6-11 Dec 2010,
Rome, Ital
Genetic variation at mouse and human ribosomal DNA influences associated epigenetic states
Background: Ribosomal DNA (rDNA) displays substantial inter-individual genetic variation in human and mouse. A systematic analysis of how this variation impacts epigenetic states and expression of the rDNA has thus far not been performed.
Results: Using a combination of long- and short-read sequencing, we establish that 45S rDNA units in the C57BL/6J mouse strain exist as distinct genetic haplotypes that influence the epigenetic state and transcriptional output of any given unit. DNA methylation dynamics at these haplotypes are dichotomous and life-stage specific: at one haplotype, the DNA methylation state is sensitive to the in utero environment, but refractory to post-weaning influences, whereas other haplotypes entropically gain DNA methylation during aging only. On the other hand, individual rDNA units in human show limited evidence of genetic haplotypes, and hence little discernible correlation between genetic and epigenetic states. However, in both species, adjacent units show similar epigenetic profiles, and the overall epigenetic state at rDNA is strongly positively correlated with the total rDNA copy number. Analysis of different mouse inbred strains reveals that in some strains, such as 129S1/SvImJ, the rDNA copy number is only approximately 150 copies per diploid genome and DNA methylation levels are < 5%.
Conclusions: Our work demonstrates that rDNA-associated genetic variation has a considerable influence on rDNA epigenetic state and consequently rRNA expression outcomes. In the future, it will be important to consider the impact of inter-individual rDNA (epi)genetic variation on mammalian phenotypes and diseases
Recent advances in candidate-gene and whole-genome approaches to the discovery of anthelmintic resistance markers and the description of drug/receptor interactions
Anthelmintic resistance has a great impact on livestock production systems
worldwide, is an emerging concern in companion animal medicine, and represents
a threat to our ongoing ability to control human soil-transmitted helminths.
The Consortium for Anthelmintic Resistance and Susceptibility (CARS) provides
a forum for scientists to meet and discuss the latest developments in the
search for molecular markers of anthelmintic resistance. Such markers are
important for detecting drug resistant worm populations, and indicating the
likely impact of the resistance on drug efficacy. The molecular basis of
resistance is also important for understanding how anthelmintics work, and how
drug resistant populations arise. Changes to target receptors, drug efflux and
other biological processes can be involved. This paper reports on the CARS
group meeting held in August 2013 in Perth, Australia. The latest knowledge on
the development of molecular markers for resistance to each of the principal
classes of anthelmintics is reviewed. The molecular basis of resistance is
best understood for the benzimidazole group of compounds, and we examine
recent work to translate this knowledge into useful diagnostics for field use.
We examine recent candidate-gene and whole-genome approaches to understanding
anthelmintic resistance and identify markers. We also look at drug
transporters in terms of providing both useful markers for resistance, as well
as opportunities to overcome resistance through the targeting of the
transporters themselves with inhibitors. Finally, we describe the tools
available for the application of the newest high-throughput sequencing
technologies to the study of anthelmintic resistance
- âŠ