258 research outputs found
19 Dubious Ways to Compute the Marginal Likelihood of a Phylogenetic Tree Topology.
The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators
Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies.
Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, "likelihood" of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method "phylogenetic topographer" (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies
Demonstration of the synchrotron-type spectrum of laser-produced Betatron radiation
Betatron X-ray radiation in laser-plasma accelerators is produced when
electrons are accelerated and wiggled in the laser-wakefield cavity. This
femtosecond source, producing intense X-ray beams in the multi kiloelectronvolt
range has been observed at different interaction regime using high power laser
from 10 to 100 TW. However, none of the spectral measurement performed were at
sufficient resolution, bandwidth and signal to noise ratio to precisely
determine the shape of spectra with a single laser shot in order to avoid shot
to shot fluctuations. In this letter, the Betatron radiation produced using a
80 TW laser is characterized by using a single photon counting method. We
measure in single shot spectra from 8 to 21 keV with a resolution better than
350 eV. The results obtained are in excellent agreement with theoretical
predictions and demonstrate the synchrotron type nature of this radiation
mechanism. The critical energy is found to be Ec = 5.6 \pm 1 keV for our
experimental conditions. In addition, the features of the source at this energy
range open novel perspectives for applications in time-resolved X-ray science.Comment: 5 pages, 4 figure
State selective measurements of HCI produced by strong ultrashort laser clusters interaction
International audienceWe have performed studies of keV x-ray production from (Ar)n , (Kr)n and (Xe)n rare gas clusters (with n between 104 and 106 atoms/cluster) submitted to intense (~10^18 W/cm2) infrared (790 nm) laser pulses. We have determined the photon energies and the absolute photon emission yields as a function of several physical parameters governing the interaction : size and atomic number of the clusters, peak intensity of the laser. Up to 10^6 3 keV photons per pulse at a moderate (10^15/cm3) atomic density have been observed. High resolution spectroscopy studies in the case of (Ar)n clusters have also been performed, giving unambiguous evidence of highly charged (up to heliumlike) ions with K vacancies production. The results obtained indicate that X-rays are emitted before cluster explosion on a subpicosecond time scale, and shed some light on the mechanisms involved in the first stage of the production of the nanoplasma induced from each cluster
Effective online Bayesian phylogenetics via sequential Monte Carlo with guided proposals
A bstract Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phy-logenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference , wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this paper we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop âguidedâ proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely-used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy
TreeFlow: probabilistic programming and automatic differentiation for phylogenetics
Probabilistic programming frameworks are powerful tools for statistical modelling and inference. They are not immediately generalisable to phylogenetic problems due to the particular computational properties of the phylogenetic tree object. TreeFlow is a software library for probabilistic programming and automatic differentiation with phylogenetic trees. It implements inference algorithms for phylogenetic tree times and model parameters, given a tree topology. We demonstrate how TreeFlow can be used to quickly implement and assess new models. We also show that it provides reasonable performance for gradient-based inference algorithms compared to specialized computational libraries for phylogenetics.Data processing pipeline can be found at https://github.com/christiaanjs/treeflow-paper
Tree topologies inferred using RAxML 8.2.12
Tree topologies are rooted using LSD 0.2
BEAST analyses are performed using BEAST 2.6.7
Variational inference analyses are performed using TreeFlow 0.0.1beta
Sequences have been removed H3N2 BEAST XML as a result of license conflicts. This complete version of this file is generated by the above pipeline.Funding provided by: University of AucklandCrossref Funder Registry ID: http://dx.doi.org/10.13039/501100001537Award Number:Carnivores sequence alignment accessed from benchmark in BEAST examples
H3N2 sequence alignment taken from Vaughan TG, KĂŒhnert D, Popinga A, Welch D, Drummond AJ. Efficient Bayesian inference under the structured coalescent. Bioinformatics. 2014 Aug 15;30(16):2272-9. doi: 10.1093/bioinformatics/btu20
The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences
<p>Abstract</p> <p>Background</p> <p>Viruses of the <it>Bunyaviridae </it>have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database.</p> <p>Results</p> <p>The VirusBanker database contains <it>Bunyaviridae </it>sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. <it>Bunyaviridae </it>sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview.</p> <p>Conclusion</p> <p>VirusBanker allows large datasets of aligned nucleotide and protein sequences from the <it>Bunyaviridae </it>to be compiled and winnowed rapidly using criteria that are formulated heuristically.</p
From where did the 2009 'swine-origin' influenza A virus (H1N1) emerge?
The swine-origin influenza A (H1N1) virus that appeared in 2009 and was first found in human beings in Mexico, is a reassortant with at least three parents. Six of the genes are closest in sequence to those of H1N2 'triple-reassortant' influenza viruses isolated from pigs in North America around 1999-2000. Its other two genes are from different Eurasian 'avian-like' viruses of pigs; the NA gene is closest to H1N1 viruses isolated in Europe in 1991-1993, and the MP gene is closest to H3N2 viruses isolated in Asia in 1999-2000. The sequences of these genes do not directly reveal the immediate source of the virus as the closest were from isolates collected more than a decade before the human pandemic started. The three parents of the virus may have been assembled in one place by natural means, such as by migrating birds, however the consistent link with pig viruses suggests that human activity was involved. We discuss a published suggestion that unsampled pig herds, the intercontinental live pig trade, together with porous quarantine barriers, generated the reassortant. We contrast that suggestion with the possibility that laboratory errors involving the sharing of virus isolates and cultured cells, or perhaps vaccine production, may have been involved. Gene sequences from isolates that bridge the time and phylogenetic gap between the new virus and its parents will distinguish between these possibilities, and we suggest where they should be sought. It is important that the source of the new virus be found if we wish to avoid future pandemics rather than just trying to minimize the consequences after they have emerged. Influenza virus is a very significant zoonotic pathogen. Public confidence in influenza research, and the agribusinesses that are based on influenza's many hosts, has been eroded by several recent events involving the virus. Measures that might restore confidence include establishing a unified international administrative framework coordinating surveillance, research and commercial work with this virus, and maintaining a registry of all influenza isolates
- âŠ