38 research outputs found

    Estimates of the effect of natural selection on protein-coding content

    Get PDF
    Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (ω) distinguishes neutrally evolving sequences (ω = 1) from those subjected to purifying (ω 1) selection. We show that current models used to estimate ω are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with ∼10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution

    PyEvolve: a toolkit for statistical modelling of molecular evolution

    No full text
    BACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences – ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. RESULTS: Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from ~10 days to ~6 hours. CONCLUSION: PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software webcite

    Cryptococcus gattii in North American Pacific Northwest: Whole-Population Genome Analysis Provides Insights into Species Evolution and Dispersal

    Get PDF
    The emergence of distinct populations of Cryptococcus gattii in the temperate North American Pacific Northwest (PNW) was surprising, as this species was previously thought to be confined to tropical and semitropical regions. Beyond a new habitat niche, the dominant emergent population displayed increased virulence and caused primary pulmonary disease, as opposed to the predominantly neurologic disease seen previously elsewhere. Whole-genome sequencing was performed on 118 C. gattii isolates, including the PNW subtypes and the global diversity of molecular type VGII, to better ascertain the natural source and genomic adaptations leading to the emergence of infection in the PNW. Overall, the VGII population was highly diverse, demonstrating large numbers of mutational and recombinational events; however, the three dominant subtypes from the PNW were of low diversity and were completely clonal. Although strains of VGII were found on at least five continents, all genetic subpopulations were represented or were most closely related to strains from South America. The phylogenetic data are consistent with multiple dispersal events from South America to North America and elsewhere. Numerous gene content differences were identified between the emergent clones and other VGII lineages, including genes potentially related to habitat adaptation, virulence, and pathology. Evidence was also found for possible gene introgression from Cryptococcus neoformans var. grubii that is rarely seen in global C. gattii but that was present in all PNW populations. These findings provide greater.IMPORTANCE Cryptococcus gattii emerged in the temperate North American Pacific Northwest (PNW) in the late 1990s. Beyond a new environmental niche, these emergent populations displayed increased virulence and resulted in a different pattern of clinical disease. In particular, severe pulmonary infections predominated in contrast to presentation with neurologic disease as seen previously elsewhere. We employed population-level whole-genome sequencing and analysis to explore the genetic relationships and gene content of the PNW C. gattii populations. We provide evidence that the PNW strains originated from South America and identified numerous genes potentially related to habitat adaptation, virulence expression, and clinical presentation. Characterization of these genetic features may lead to improved diagnostics and therapies for such fungal infections. The data indicate that there were multiple recent introductions of C. gattii into the PNW. Public health vigilance is warranted for emergence in regions where C. gattii is not thought to be endemic

    Loss of ACTN3 gene function alters mouse muscle metabolism and shows evidence of positive selection in humans

    Get PDF
    More than a billion humans worldwide are predicted to be completely deficient in the fast skeletal muscle fiber protein α-actinin-3 owing to homozygosity for a premature stop codon polymorphism, R577X, in the ACTN3 gene. The R577X polymorphism is associ

    Do genomic datasets resolve the correct relationship among the placental, marsupial and monotreme lineages?

    No full text
    Did the mammal radiation arise through initial divergence of prototherians from a common ancestor of metatherians and eutherians, the Theria hypothesis, or of eutherians from a common ancestor of metatherians and prototherians, the Marsupionta hypothesis? Molecular phylogenetic analyses of point substitutions applied to this problem have been contradictory mtDNA-encoded sequences supported Marsupionta, nuclear-encoded sequences and RY (purinepyrimidine)- recoded mtDNA supported Theria. The consistency property of maximum likelihood guarantees convergence on the true tree only with longer alignments. Results from analyses of genome datasets should therefore be impervious to choice of outgroup. We assessed whether important hypotheses concerning mammal evolution, including Theria/Marsupionta and the branching order of rodents, carnivorans and primates, are resolved by phylogenetic analyses using ∼2.3 megabases of protein-coding sequence from genome projects. In each case, only two tree topologies were being compared and thus inconsistency in resolved topologies can only derive from flawed models of sequence divergence. The results from all substitution models strongly supported Theria. For the eutherian lineages, all models were sensitive to the outgroup. We argue that phylogenetic inference from point substitutions will remain unreliable until substitution models that better match biological mechanism s of sequence divergence have been developed

    Modeling the Impact of DNA Methylation on the Evolution of BRCA1 in Mammals

    No full text
    The modified base 5-methylcytosine (mC) plays an important functional role in the biology of mammals as an epigenetic modification and appears to exert a striking impact on the molecular evolution of mammal genomes. The collective epigenetic functions o

    Regional Context in the Alignment of Biological Sequence Pairs

    No full text
    Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions

    Non-replicability of disease gene results: A modelling perspective

    No full text

    Testing for concordant equilibria between population samples

    No full text
    A substantial body of theory has been developed to assess the effect of evolutionary forces on the distribution of genotypes, both single and multilocus, within populations. One area where the potential for application of this theory has not been fully appreciated concerns the extent to which population samples differ. Within populations, the divergence of genotype or haplotype frequencies from that expected under Hardy-Weinberg (HW) or linkage equilibrium can be measured as disequilibria coefficients. To assess population samples for concordant equilibria, an analytical framework for comparing disequilibria coefficients between populations is necessary. Here we present log-linear models to evaluate such hypotheses. These models have broad utility ranging from conventional population genetics to genetic epidemiology. We demonstrate the use of these log-linear models (1) as a test for genetic association with disease and (2) as a test for different levels of linkage disequilibria between human populations

    Dynamic evolution of venom proteins in squamate reptiles

    Get PDF
    Phylogenetic analyses of toxin gene families have revolutionised our understanding of the origin and evolution of reptile venoms, leading to the current hypothesis that venom evolved once in squamate reptiles. However, because of a lack of homologous squamate non-toxin sequences, these conclusions rely on the implicit assumption that recruitments of protein families into venom are both rare and irreversible. Here we use sequences of homologous non-toxin proteins from two snake species to test these assumptions. Phylogenetic and ancestral-state analyses revealed frequent nesting of 'physiological' proteins within venom toxin clades, suggesting early ancestral recruitment into venom followed by reverse recruitment of toxins back to physiological roles. These results provide evidence that protein recruitment into venoms from physiological functions is not a one-way process, but dynamic, with reversal of function and/or co-expression of toxins in different tissues. This requires a major reassessment of our previous understanding of how animal venoms evolve
    corecore