59 research outputs found
treeImbalance
An R package containing the functions used for detecting asymmetry as described in the paper
R script for analysis of trees
R script used in the analysis of the HIV, influenza and ebola trees, and to produce figures 4, 6 and 7. Requires the ape, cluster, phylobase and adephylo packages in addition to the treeImbalance package included in this repository
Permuting a time-stamped tree.
<p>The times of the tips (solid blue lines) and internal nodes (dashed blue lines) from the observed tree (top, black) are preserved in the permuted tree (bottom, dark grey).</p
Measuring Asymmetry in Time-Stamped Phylogenies
<div><p>Previous work has shown that asymmetry in viral phylogenies may be indicative of heterogeneity in transmission, for example due to acute HIV infection or the presence of ‘core groups’ with higher contact rates. Hence, evidence of asymmetry may provide clues to underlying population structure, even when direct information on, for example, stage of infection or contact rates, are missing. However, current tests of phylogenetic asymmetry (a) suffer from false positives when the tips of the phylogeny are sampled at different times and (b) only test for global asymmetry, and hence suffer from false negatives when asymmetry is localised to part of a phylogeny. We present a simple permutation-based approach for testing for asymmetry in a phylogeny, where we compare the observed phylogeny with random phylogenies with the same sampling and coalescence times, to reduce the false positive rate. We also demonstrate how profiles of measures of asymmetry calculated over a range of evolutionary times in the phylogeny can be used to identify local asymmetry. In combination with different metrics of asymmetry, this combined approach offers detailed insights of how phylogenies reconstructed from real viral datasets may deviate from the simplistic assumptions of commonly used coalescent and birth-death process models.</p></div
Inferring the Source of Transmission with Phylogenetic Data
<div><p>Identifying the source of transmission using pathogen genetic data is complicated by numerous biological, immunological, and behavioral factors. A large source of error arises when there is incomplete or sparse sampling of cases. Unsampled cases may act as either a common source of infection or as an intermediary in a transmission chain for hosts infected with genetically similar pathogens. It is difficult to quantify the probability of common source or intermediate transmission events, which has made it difficult to develop statistical tests to either confirm or deny putative transmission pairs with genetic data. We present a method to incorporate additional information about an infectious disease epidemic, such as incidence and prevalence of infection over time, to inform estimates of the probability that one sampled host is the direct source of infection of another host in a pathogen gene genealogy. These methods enable forensic applications, such as source-case attribution, for infectious disease epidemics with incomplete sampling, which is usually the case for high-morbidity community-acquired pathogens like HIV, Influenza and Dengue virus. These methods also enable epidemiological applications such as the identification of factors that increase the risk of transmission. We demonstrate these methods in the context of the HIV epidemic in Detroit, Michigan, and we evaluate the suitability of current sequence databases for forensic and epidemiological investigations. We find that currently available sequences collected for drug resistance testing of HIV are unlikely to be useful in most forensic investigations, but are useful for identifying transmission risk factors.</p></div
Asymmetry in influenza A H5N1.
<p>a) Tree of 98 influenza A H5N1 haemagglutinin sequences sampled from bird species in Asia. b) Observed cumulative number of cherries over time (black), with results from permuted trees (grey). Inset histogram shows global results. Red lines show the medoid (solid) and 95% confidence interval of the permuted results (dashed). c) Trajectories for the cumulative Sackin’s index. d) Node effect on Sackin’s index over time. Nodes which are significant at an unadjusted p-value of 5% are shown by an open red circle.</p
Trees in newick format
This is a .zip file containing the within-host HIV, H5N1 influenza and ebola virus trees in Newick
format which were analysed in the paper. Many thanks to Andrew Rambaut (University of Edinburgh) for
providing the ebola phylogeny.
Data originally from:
1) Within-host HIV (P83_HIV.nwk)
Frost SDW, Wrin T, Smith DM, Kosakovsky Pond SL, Liu Y, et al. (2005) Neutralizing antibody
responses drive the evolution of human immunodeficiency virus type 1 envelope during recent HIV
infection. Proc Natl Acad Sci USA 102: 18514-9
2) H5N1 influenza (H5N1_flu.nwk)
Wallace RG, HoDac H, Lathrop RH, Fitch WM (2007) A statistical phylogeography of influenza A
H5N1. Proc Natl Acad Sci USA 104: 4473-4478.
3) Ebola Virus (Ebola.nwk)
Gire SK, Goba A, Andersen KG, Sealfon RSG, Park DJ, et al. (2014) Genomic surveillance elucidates
Ebola virus origin and transmission during the 2014 outbreak. Science 345: 1369-1372
Asymmetry in the Sierra Leone ebola epidemic.
<p>a) Tree of 78 ebola virus whole genome sequences. b) Observed cumulative number of cherries over time (black), with results from permuted trees (grey). Inset histogram shows global results. Red lines show the medoid (solid) and 95% confidence interval of the permuted results (dashed). c) Trajectories for the cumulative Sackin’s index. d) Node effect on Sackin’s index over time.</p
Within-host asymmetry is not always due to immune selection.
<p>a) Tree of 134 HIV envelope sequences from patient 83 [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004312#pcbi.1004312.ref032" target="_blank">32</a>]. b) Observed cumulative number of cherries over time (black), with results from permuted trees (grey). Inset histogram shows global results. Red lines show the medoid (solid) and 95% confidence interval of the permuted results (dashed). c) Trajectories for the cumulative Sackin’s index. d) Node effect on Sackin’s index over time. Nodes identified as significantly more asymmetric than expected with the Bonferroni correction are marked with a filled red circle in a), and those which are significant at an unadjusted p-value of 5% are shown by an open red circle.</p
Comparison of infector probabilities and frequency of transmission events in simulations.
<p>On the left, infector probabilities are calculated for the true transmission genealogy in 20 independent simulated HIV epidemics and samples of 662 individuals. On the right, infector probabilities are based on simulated sequence data for a single simulation and a sample of 662 individuals. Data are pooled from 50 trees sampled from the Bayesian phylogenetic posterior distribution. Middle: The estimated infector probabilities (x-axis) versus whether a transmission actually occured (hash marks) for all pairs of sampled individuals in the HIV simulation. The red line shows a local-average of the frequency of transmission events. The green line shows a linear regression of true transmission events (coded zero or one) on the estimated infector probability. Histograms show the frequency of estimated infector probabilities when transmissions happen (top) and when they don't (bottom).</p
- …