434 research outputs found
Branch length estimation and divergence dating: estimates of error in Bayesian and maximum likelihood frameworks
<p>Abstract</p> <p>Background</p> <p>Estimates of divergence dates between species improve our understanding of processes ranging from nucleotide substitution to speciation. Such estimates are frequently based on molecular genetic differences between species; therefore, they rely on accurate estimates of the number of such differences (i.e. substitutions per site, measured as branch length on phylogenies). We used simulations to determine the effects of dataset size, branch length heterogeneity, branch depth, and analytical framework on branch length estimation across a range of branch lengths. We then reanalyzed an empirical dataset for plethodontid salamanders to determine how inaccurate branch length estimation can affect estimates of divergence dates.</p> <p>Results</p> <p>The accuracy of branch length estimation varied with branch length, dataset size (both number of taxa and sites), branch length heterogeneity, branch depth, dataset complexity, and analytical framework. For simple phylogenies analyzed in a Bayesian framework, branches were increasingly underestimated as branch length increased; in a maximum likelihood framework, longer branch lengths were somewhat overestimated. Longer datasets improved estimates in both frameworks; however, when the number of taxa was increased, estimation accuracy for deeper branches was less than for tip branches. Increasing the complexity of the dataset produced more misestimated branches in a Bayesian framework; however, in an ML framework, more branches were estimated more accurately. Using ML branch length estimates to re-estimate plethodontid salamander divergence dates generally resulted in an increase in the estimated age of older nodes and a decrease in the estimated age of younger nodes.</p> <p>Conclusions</p> <p>Branch lengths are misestimated in both statistical frameworks for simulations of simple datasets. However, for complex datasets, length estimates are quite accurate in ML (even for short datasets), whereas few branches are estimated accurately in a Bayesian framework. Our reanalysis of empirical data demonstrates the magnitude of effects of Bayesian branch length misestimation on divergence date estimates. Because the length of branches for empirical datasets can be estimated most reliably in an ML framework when branches are <1 substitution/site and datasets are ≥1 kb, we suggest that divergence date estimates using datasets, branch lengths, and/or analytical techniques that fall outside of these parameters should be interpreted with caution.</p
Variation in DNA Substitution Rates among Lineages Erroneously Inferred from Simulated Clock-Like Data
BACKGROUND: The observation of variation in substitution rates among lineages has led to (1) a general rejection of the molecular clock model, and (2) the suggestion that a number of biological characteristics of organisms can cause rate variation. Accurate estimates of rate variation, and thus accurate inferences regarding the causes of rate variation, depend on accurate estimates of substitution rates. However, theory suggests that even when the substitution process is clock-like, variable numbers of substitutions can occur among lineages because the substitution process is stochastic. Furthermore, substitution rates along lineages can be misestimated, particularly when multiple substitutions occur at some sites. Although these potential causes of error in rate estimation are well understood in theory, such error has not been examined in detail; consequently, empirical studies that estimate rate variation among lineages have been unable to determine whether their results could be impacted by estimation error. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate the extent to which error in rate estimation could erroneously suggest rate variation among lineages, we examined rate variation estimated for datasets simulated under a molecular clock on trees with equal and variable branch lengths. Thus, any apparent rate variation in these datasets reflects error in rate estimation rather than true differences in the underlying substitution process. We observed substantial rate variation among lineages in our simulations; however, we did not observe rate variation when average substitution rates were compared between different clades. CONCLUSIONS/SIGNIFICANCE: Our results confirm previous theoretical work suggesting that observations of among lineage rate variation in empirical data may be due to the stochastic substitution process and error in the estimation of substitution rates, rather than true differences in the underlying substitution process among lineages. However, conclusions regarding rate variation drawn from rates averaged across multiple branches are likely due to real, systematic variation in rates between groups
A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-Generation Sequencing
We have developed a novel method to rapidly obtain homologous genomic data
for phylogenetics directly from next-generation sequencing reads without the
use of a reference genome. This software, called SISRS, avoids the time
consuming steps of de novo whole genome assembly, genome-genome alignment, and
annotation. For simulations SISRS is able to identify large numbers of loci
containing variable sites with phylogenetic signal. For genomic data from apes,
SISRS identified thousands of variable sites, from which we produced an
accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers
that we used to estimate the phylogeny of placental mammals. We recovered
phylogenies from multiple datasets that were consistent with previous
conflicting estimates of the relationships among mammals. SISRS is open source
and freely available at https://github.com/rachelss/SISRS.Comment: 12 pages plus36 figures, 1 supplementary table, 3 supplementary
figure
Recommended from our members
All, Some or None: Synchronous or Asynchronous: Creating a climate to meet your students’ needs – #SWDE2018
As Online MSW programs emerge, there are a variety of methods in the online teaching world that direct programs in how they approach the synchronous vs asynchronous conversation. It is important to consider selecting a format that is the right fit for the school, faculty and the desired student population. The panel will present case studies from three online programs (sharing experiences from asynchronous, synchronous and a blended asynchronous-synchronous program), explore the pros and cons of these options, and discuss the debates that their programs explored to come to their decision in their models they chose for their three Online MSW Programs. This interactive panel will include opportunities for participants to ask questions and share their own challenges and solutions
Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions
Motivation: Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. Results: We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. Availability and Implementation: Methods and data files are available at https://github.com/ CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). Contact: [email protected] Supplementary information: Supplementary data is available at Bioinformatics online
Quantification of coronary artery calcium by electron beam computed tomography for determination of severity of angiographic coronary artery disease in younger patients
Objectives.This study attempted to 1) evaluate five quantitative measures of coronary artery calcium and determine which best agreed with coronary artery disease severity at angiography; and 2) determine optimal quantity cutpoints to distinguish among no, mild and significant disease.Background.Coronary artery calcium identified noninvasively by electron beam computed tomography is a sensitive marker for atherosclerosis. Quantitative assessments of calcium could distinguish among patients with no, mild and significant disease in clinical, screening and research settings.Methods.One hundred sixty patients, 23 to 59 years old, underwent coronary angiography and electron beam computed tomography. Coronary artery calciumwas defined as dense (> 130 Hounsfield units) foci ≥2 mm2on the tomogram. Regression and receiver operating characteristic analyses were used to evaluate five quantitative measures of calcium as predictors of the largest stenosis in the coronary arteries and to identify optimal cutpoints for distinguishing among disease categories. No diseasewas defined as no stenosis, mild diseaseas 10% to 49% diameter stenosis in one or more major branches and significant diseaseas ≥ 50% diameter stenosis in one or more major branches.Results.All measures evaluated performed well. With calcific area as the quantitative measure, the best cutpoint for discriminating between patients with and without disease was the presence of calcium: sensitivity 81%, specificity 86% and overall accuracy 83%. The best cutpoint for discriminating between patients with and without significant disease was 18 mm2: sensitivity 86%, specificity 81% and accuracy 83%.Conclusions.Because the ranges of calcium quantity overlapped across disease categories, no cutpoints would distinguish among categories with absolute certainty. However, selected cutpoints could rule out disease in most healthy subjects and identify most patients with significant disease
A unique bacteriohopanetetrol stereoisomer of marine anammox
Anaerobic ammonium oxidation (anammox) is a major process of bioavailable nitrogen removal from marine systems. Previously, a bacteriohopanetetrol (BHT) isomer, with unknown stereochemistry, eluting later than BHT using high performance liquid chromatography (HPLC), was detected in ‘Ca. Scalindua profunda’ and proposed as a biomarker for anammox in marine paleo-environments. However, the utility of this BHT isomer as an anammox biomarker is hindered by the fact that four other, non-anammox bacteria are also known to produce a late-eluting BHT stereoisomer. The stereochemistry in Acetobacter pasteurianus, Komagataeibacter xylinus and Frankia sp. was known to be 17β, 21β(H), 22R, 32R, 33R, 34R (BHT-34R). The stereochemistry of the late-eluting BHT in Methylocella palustris was unknown. To determine if marine anammox bacteria produce a unique BHT isomer, we studied the BHT distributions and stereochemistry of known BHT isomer producers and of previously unscreened marine (‘Ca. Scalindua brodeae’) and freshwater (‘Ca. Brocadia sp.’) anammox bacteria using HPLC and gas chromatographic (GC) analysis of acetylated BHTs and ultra high performance liquid chromatography (UHPLC)-high resolution mass spectrometry (HRMS) analysis of non-acetylated BHTs. The 34R stereochemistry was confirmed for the BHT isomers in Ca. Brocadia sp. and Methylocella palustris. However, ‘Ca. Scalindua sp.’ synthesise a stereochemically distinct BHT isomer, with still unconfirmed stereochemistry (BHT-x). Only GC analysis of acetylated BHT and UHPLC analysis of non-acetylated BHT distinguished between late-eluting BHT isomers. Acetylated BHT-x and BHT-34R co-elute by HPLC. As BHT-x is currently only known to be produced by ‘Ca. Scalindua spp.’, it may be a biomarker for marine anammox
Interacting Spin-2 Fields
We construct consistent theories of multiple interacting spin-2 fields in
arbitrary spacetime dimensions using a vielbein formulation. We show that these
theories have the additional primary constraints needed to eliminate potential
ghosts, to all orders in the fields, and to all orders beyond any decoupling
limit. We postulate that the number of spin-2 fields interacting at a single
vertex is limited by the number of spacetime dimensions. We then show that, for
the case of two spin-2 fields, the vielbein theory is equivalent to the
recently proposed theories of ghost-free massive gravity and bi-metric gravity.
The vielbein formulation greatly simplifies the proof that these theories have
an extra primary constraint which eliminates the Boulware-Deser ghost.Comment: 42 pages, 3 figures. v3 alternative argument using constrained
spatial vielbeins has been removed (see footnote 3
Nonthermal Hard X-ray Emission and Iron Kalpha Emission from a Superflare on II Pegasi
We report on an X-ray flare detected on the active binary system II~Pegasi
with the Swift telescope. The trigger had a 10-200 keV luminosity of
2.2 erg s-- a superflare, by comparison with energies of
typical stellar flares on active binary systems. The trigger spectrum indicates
a hot thermal plasma with T180 K. X-ray spectral analysis
from 0.8--200 keV with the X-Ray Telescope and BAT in the next two orbits
reveals evidence for a thermal component (T80 K) and Fe K 6.4
keV emission. A tail of emission out to 200 keV can be fit with either an
extremely high temperature thermal plasma (TK) or power-law
emission. Based on analogies with solar flares, we attribute the excess
continuum emission to nonthermal thick-target bremsstrahlung emission from a
population of accelerated electrons. We estimate the radiated energy from
0.01--200 keV to be erg, the total radiated energy over
all wavelengths erg, the energy in nonthermal electrons above 20
keV erg, and conducted energy erg. The
nonthermal interpretation gives a reasonable value for the total energy in
electrons 20 keV when compared to the upper and lower bounds on the thermal
energy content of the flare. This marks the first occasion in which evidence
exists for nonthermal hard X-ray emission from a stellar flare. We investigate
the emission mechanism responsible for producing the 6.4 keV feature, and find
that collisional ionization from nonthermal electrons appears to be more
plausible than the photoionization mechanism usually invoked on the Sun and
pre-main sequence stars.Comment: 41 pages, 7 figures, accepted for publication in the Astrophysical
Journa
- …