27 research outputs found
Rapid forward-in-time simulation at the chromosome and genome level
Background: In population genetics, simulation is a fundamental tool for analyzing how basic evolutionary forces such as natural selection, recombination, and mutation shape the genetic landscape of a population. Forward simulation represents the most powerful, but, at the same time, most compute-intensive approach for simulating the genetic material of a population.
Results: We introduce AnA-FiTS, a highly optimized forward simulation software, that is up to two orders of magnitude faster than current state-of-the-art software. In addition, we present a novel algorithm that further improves runtimes by up to an additional order of magnitude, for simulations where a fraction of the mutations is neutral (e.g., only 10% of mutations have an effect on fitness). Apart from simulated sequences, our tool also generates a graph structure that depicts the complete observable history of neutral mutations.
Conclusions: The substantial performance improvements allow for conducting forward simulations at the chromosome and genome level. The graph structure generated by our algorithm can give rise to novel approaches for visualizing and analyzing the output of forward simulations
The Phylogenetic Likelihood Library
[Abstract] We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2–10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required).DFG, German Research Foundation; STA/860-4. F.I.-C.DFG, German Research Foundation; STA/860-3DFG, German Research Foundation; STA/860-2. L.-T.N.University of Vienna; I059-NAustrian Science Fund; I760-B1
Next-generation sequencing reveals the impact of repetitive DNA in phylogenetically closely related genomes of Orobanchaceae
We used next-generation sequencing to characterize the genomes of nine species of Orobanchaceae of known phylogenetic relationships, different life forms, and including a polyploid species. The study species are the autotrophic, nonparasitic Lindenbergia philippensis, the hemiparasitic Schwalbea americana, and seven nonphotosynthetic parasitic species of Orobanche (Orobanche crenata, Orobanche cumana, Orobanche gracilis (tetraploid), and Orobanche pancicii) and Phelipanche (Phelipanche lavandulacea, Phelipanche purpurea, and Phelipanche ramosa). Ty3/Gypsy elements comprise 1.93%–28.34% of the nine genomes and Ty1/Copia elements comprise 8.09%–22.83%. When compared with L. philippensis and S. americana, the nonphotosynthetic species contain higher proportions of repetitive DNA sequences, perhaps reflecting relaxed selection on genome size in parasitic organisms. Among the parasitic species, those in the genus Orobanche have smaller genomes but higher proportions of repetitive DNA than those in Phelipanche, mostly due to a diversification of repeats and an accumulation of Ty3/
Gypsy elements. Genome downsizing in the tetraploid O. gracilis probably led to sequence loss across most repeat types
Cabbage and fermented vegetables : From death rate heterogeneity in countries to candidates for mitigation strategies of severe COVID-19
Large differences in COVID-19 death rates exist between countries and between regions of the same country. Some very low death rate countries such as Eastern Asia, Central Europe, or the Balkans have a common feature of eating large quantities of fermented foods. Although biases exist when examining ecological studies, fermented vegetables or cabbage have been associated with low death rates in European countries. SARS-CoV-2 binds to its receptor, the angiotensin-converting enzyme 2 (ACE2). As a result of SARS-CoV-2 binding, ACE2 downregulation enhances the angiotensin II receptor type 1 (AT(1)R) axis associated with oxidative stress. This leads to insulin resistance as well as lung and endothelial damage, two severe outcomes of COVID-19. The nuclear factor (erythroid-derived 2)-like 2 (Nrf2) is the most potent antioxidant in humans and can block in particular the AT(1)R axis. Cabbage contains precursors of sulforaphane, the most active natural activator of Nrf2. Fermented vegetables contain many lactobacilli, which are also potent Nrf2 activators. Three examples are: kimchi in Korea, westernized foods, and the slum paradox. It is proposed that fermented cabbage is a proof-of-concept of dietary manipulations that may enhance Nrf2-associated antioxidant effects, helpful in mitigating COVID-19 severity.Peer reviewe
Nrf2-interacting nutrients and COVID-19 : time for research to develop adaptation strategies
There are large between- and within-country variations in COVID-19 death rates. Some very low death rate settings such as Eastern Asia, Central Europe, the Balkans and Africa have a common feature of eating large quantities of fermented foods whose intake is associated with the activation of the Nrf2 (Nuclear factor (erythroid-derived 2)-like 2) anti-oxidant transcription factor. There are many Nrf2-interacting nutrients (berberine, curcumin, epigallocatechin gallate, genistein, quercetin, resveratrol, sulforaphane) that all act similarly to reduce insulin resistance, endothelial damage, lung injury and cytokine storm. They also act on the same mechanisms (mTOR: Mammalian target of rapamycin, PPAR gamma:Peroxisome proliferator-activated receptor, NF kappa B: Nuclear factor kappa B, ERK: Extracellular signal-regulated kinases and eIF2 alpha:Elongation initiation factor 2 alpha). They may as a result be important in mitigating the severity of COVID-19, acting through the endoplasmic reticulum stress or ACE-Angiotensin-II-AT(1)R axis (AT(1)R) pathway. Many Nrf2-interacting nutrients are also interacting with TRPA1 and/or TRPV1. Interestingly, geographical areas with very low COVID-19 mortality are those with the lowest prevalence of obesity (Sub-Saharan Africa and Asia). It is tempting to propose that Nrf2-interacting foods and nutrients can re-balance insulin resistance and have a significant effect on COVID-19 severity. It is therefore possible that the intake of these foods may restore an optimal natural balance for the Nrf2 pathway and may be of interest in the mitigation of COVID-19 severity
Data from: An efficient independence sampler for updating branches in Bayesian Markov chain Monte Carlo sampling of phylogenetic trees
Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that do not rely on old branch lengths but instead are based on approximations of the conditional posterior. Using a diverse set of empirical data sets, we show that most conditional branch posteriors can be accurately approximated via a Γ
distribution. We empirically determine the relationship between the logarithmic conditional posterior density, its derivatives, and the characteristics of the branch posterior. We use these relationships to derive an independence sampler for proposing branches with an acceptance ratio of ∼90% on most data sets. This proposal samples branches between 2× and 3× more efficiently than traditional proposals with respect to the effective sample size per unit of runtime. We also compare the performance of standard topology proposals with hybrid proposals that use the new independence sampler to update those branches that are most affected by the topological change. Our results show that hybrid proposals can sometimes noticeably decrease the number of generations necessary for topological convergence. Inconsistent performance gains indicate that branch updates are not the limiting factor in improving topological convergence for the currently employed set of proposals. However, our independence sampler might be essential for the construction of novel tree proposals that apply more radical topology changes
Data from: Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice
The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive web-service implementing this algorithm. Compared to our previous method, the new algorithm is up to four orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world datasets, we show that, our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116,334 taxa each. Using simulated datasets we show that, when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum likelihood trees that are topologically closer to the respective true trees