429 research outputs found

    Branch length estimation and divergence dating: estimates of error in Bayesian and maximum likelihood frameworks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Estimates of divergence dates between species improve our understanding of processes ranging from nucleotide substitution to speciation. Such estimates are frequently based on molecular genetic differences between species; therefore, they rely on accurate estimates of the number of such differences (i.e. substitutions per site, measured as branch length on phylogenies). We used simulations to determine the effects of dataset size, branch length heterogeneity, branch depth, and analytical framework on branch length estimation across a range of branch lengths. We then reanalyzed an empirical dataset for plethodontid salamanders to determine how inaccurate branch length estimation can affect estimates of divergence dates.</p> <p>Results</p> <p>The accuracy of branch length estimation varied with branch length, dataset size (both number of taxa and sites), branch length heterogeneity, branch depth, dataset complexity, and analytical framework. For simple phylogenies analyzed in a Bayesian framework, branches were increasingly underestimated as branch length increased; in a maximum likelihood framework, longer branch lengths were somewhat overestimated. Longer datasets improved estimates in both frameworks; however, when the number of taxa was increased, estimation accuracy for deeper branches was less than for tip branches. Increasing the complexity of the dataset produced more misestimated branches in a Bayesian framework; however, in an ML framework, more branches were estimated more accurately. Using ML branch length estimates to re-estimate plethodontid salamander divergence dates generally resulted in an increase in the estimated age of older nodes and a decrease in the estimated age of younger nodes.</p> <p>Conclusions</p> <p>Branch lengths are misestimated in both statistical frameworks for simulations of simple datasets. However, for complex datasets, length estimates are quite accurate in ML (even for short datasets), whereas few branches are estimated accurately in a Bayesian framework. Our reanalysis of empirical data demonstrates the magnitude of effects of Bayesian branch length misestimation on divergence date estimates. Because the length of branches for empirical datasets can be estimated most reliably in an ML framework when branches are <1 substitution/site and datasets are ≥1 kb, we suggest that divergence date estimates using datasets, branch lengths, and/or analytical techniques that fall outside of these parameters should be interpreted with caution.</p

    Variation in DNA Substitution Rates among Lineages Erroneously Inferred from Simulated Clock-Like Data

    Get PDF
    BACKGROUND: The observation of variation in substitution rates among lineages has led to (1) a general rejection of the molecular clock model, and (2) the suggestion that a number of biological characteristics of organisms can cause rate variation. Accurate estimates of rate variation, and thus accurate inferences regarding the causes of rate variation, depend on accurate estimates of substitution rates. However, theory suggests that even when the substitution process is clock-like, variable numbers of substitutions can occur among lineages because the substitution process is stochastic. Furthermore, substitution rates along lineages can be misestimated, particularly when multiple substitutions occur at some sites. Although these potential causes of error in rate estimation are well understood in theory, such error has not been examined in detail; consequently, empirical studies that estimate rate variation among lineages have been unable to determine whether their results could be impacted by estimation error. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate the extent to which error in rate estimation could erroneously suggest rate variation among lineages, we examined rate variation estimated for datasets simulated under a molecular clock on trees with equal and variable branch lengths. Thus, any apparent rate variation in these datasets reflects error in rate estimation rather than true differences in the underlying substitution process. We observed substantial rate variation among lineages in our simulations; however, we did not observe rate variation when average substitution rates were compared between different clades. CONCLUSIONS/SIGNIFICANCE: Our results confirm previous theoretical work suggesting that observations of among lineage rate variation in empirical data may be due to the stochastic substitution process and error in the estimation of substitution rates, rather than true differences in the underlying substitution process among lineages. However, conclusions regarding rate variation drawn from rates averaged across multiple branches are likely due to real, systematic variation in rates between groups

    A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-Generation Sequencing

    Full text link
    We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, genome-genome alignment, and annotation. For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered phylogenies from multiple datasets that were consistent with previous conflicting estimates of the relationships among mammals. SISRS is open source and freely available at https://github.com/rachelss/SISRS.Comment: 12 pages plus36 figures, 1 supplementary table, 3 supplementary figure

    Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions

    Get PDF
    Motivation: Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. Results: We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. Availability and Implementation: Methods and data files are available at https://github.com/ CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). Contact: [email protected] Supplementary information: Supplementary data is available at Bioinformatics online

    Quantification of coronary artery calcium by electron beam computed tomography for determination of severity of angiographic coronary artery disease in younger patients

    Get PDF
    Objectives.This study attempted to 1) evaluate five quantitative measures of coronary artery calcium and determine which best agreed with coronary artery disease severity at angiography; and 2) determine optimal quantity cutpoints to distinguish among no, mild and significant disease.Background.Coronary artery calcium identified noninvasively by electron beam computed tomography is a sensitive marker for atherosclerosis. Quantitative assessments of calcium could distinguish among patients with no, mild and significant disease in clinical, screening and research settings.Methods.One hundred sixty patients, 23 to 59 years old, underwent coronary angiography and electron beam computed tomography. Coronary artery calciumwas defined as dense (> 130 Hounsfield units) foci ≥2 mm2on the tomogram. Regression and receiver operating characteristic analyses were used to evaluate five quantitative measures of calcium as predictors of the largest stenosis in the coronary arteries and to identify optimal cutpoints for distinguishing among disease categories. No diseasewas defined as no stenosis, mild diseaseas 10% to 49% diameter stenosis in one or more major branches and significant diseaseas ≥ 50% diameter stenosis in one or more major branches.Results.All measures evaluated performed well. With calcific area as the quantitative measure, the best cutpoint for discriminating between patients with and without disease was the presence of calcium: sensitivity 81%, specificity 86% and overall accuracy 83%. The best cutpoint for discriminating between patients with and without significant disease was 18 mm2: sensitivity 86%, specificity 81% and accuracy 83%.Conclusions.Because the ranges of calcium quantity overlapped across disease categories, no cutpoints would distinguish among categories with absolute certainty. However, selected cutpoints could rule out disease in most healthy subjects and identify most patients with significant disease

    A unique bacteriohopanetetrol stereoisomer of marine anammox

    Get PDF
    Anaerobic ammonium oxidation (anammox) is a major process of bioavailable nitrogen removal from marine systems. Previously, a bacteriohopanetetrol (BHT) isomer, with unknown stereochemistry, eluting later than BHT using high performance liquid chromatography (HPLC), was detected in ‘Ca. Scalindua profunda’ and proposed as a biomarker for anammox in marine paleo-environments. However, the utility of this BHT isomer as an anammox biomarker is hindered by the fact that four other, non-anammox bacteria are also known to produce a late-eluting BHT stereoisomer. The stereochemistry in Acetobacter pasteurianus, Komagataeibacter xylinus and Frankia sp. was known to be 17β, 21β(H), 22R, 32R, 33R, 34R (BHT-34R). The stereochemistry of the late-eluting BHT in Methylocella palustris was unknown. To determine if marine anammox bacteria produce a unique BHT isomer, we studied the BHT distributions and stereochemistry of known BHT isomer producers and of previously unscreened marine (‘Ca. Scalindua brodeae’) and freshwater (‘Ca. Brocadia sp.’) anammox bacteria using HPLC and gas chromatographic (GC) analysis of acetylated BHTs and ultra high performance liquid chromatography (UHPLC)-high resolution mass spectrometry (HRMS) analysis of non-acetylated BHTs. The 34R stereochemistry was confirmed for the BHT isomers in Ca. Brocadia sp. and Methylocella palustris. However, ‘Ca. Scalindua sp.’ synthesise a stereochemically distinct BHT isomer, with still unconfirmed stereochemistry (BHT-x). Only GC analysis of acetylated BHT and UHPLC analysis of non-acetylated BHT distinguished between late-eluting BHT isomers. Acetylated BHT-x and BHT-34R co-elute by HPLC. As BHT-x is currently only known to be produced by ‘Ca. Scalindua spp.’, it may be a biomarker for marine anammox

    Interacting Spin-2 Fields

    Full text link
    We construct consistent theories of multiple interacting spin-2 fields in arbitrary spacetime dimensions using a vielbein formulation. We show that these theories have the additional primary constraints needed to eliminate potential ghosts, to all orders in the fields, and to all orders beyond any decoupling limit. We postulate that the number of spin-2 fields interacting at a single vertex is limited by the number of spacetime dimensions. We then show that, for the case of two spin-2 fields, the vielbein theory is equivalent to the recently proposed theories of ghost-free massive gravity and bi-metric gravity. The vielbein formulation greatly simplifies the proof that these theories have an extra primary constraint which eliminates the Boulware-Deser ghost.Comment: 42 pages, 3 figures. v3 alternative argument using constrained spatial vielbeins has been removed (see footnote 3

    Nonthermal Hard X-ray Emission and Iron Kalpha Emission from a Superflare on II Pegasi

    Full text link
    We report on an X-ray flare detected on the active binary system II~Pegasi with the Swift telescope. The trigger had a 10-200 keV luminosity of 2.2×1032\times10^{32} erg s1^{-1}-- a superflare, by comparison with energies of typical stellar flares on active binary systems. The trigger spectrum indicates a hot thermal plasma with T\sim180 ×106\times10^{6}K. X-ray spectral analysis from 0.8--200 keV with the X-Ray Telescope and BAT in the next two orbits reveals evidence for a thermal component (T>>80 ×106\times10^{6}K) and Fe K 6.4 keV emission. A tail of emission out to 200 keV can be fit with either an extremely high temperature thermal plasma (T3×108\sim3\times10^{8}K) or power-law emission. Based on analogies with solar flares, we attribute the excess continuum emission to nonthermal thick-target bremsstrahlung emission from a population of accelerated electrons. We estimate the radiated energy from 0.01--200 keV to be 6×1036\sim6\times10^{36} erg, the total radiated energy over all wavelengths 1038\sim10^{38} erg, the energy in nonthermal electrons above 20 keV 3×1040\sim3\times10^{40} erg, and conducted energy <5×1043<5\times10^{43} erg. The nonthermal interpretation gives a reasonable value for the total energy in electrons >> 20 keV when compared to the upper and lower bounds on the thermal energy content of the flare. This marks the first occasion in which evidence exists for nonthermal hard X-ray emission from a stellar flare. We investigate the emission mechanism responsible for producing the 6.4 keV feature, and find that collisional ionization from nonthermal electrons appears to be more plausible than the photoionization mechanism usually invoked on the Sun and pre-main sequence stars.Comment: 41 pages, 7 figures, accepted for publication in the Astrophysical Journa
    corecore