36 research outputs found
Quantifying within-host diversity of H5N1 influenza viruses in humans and poultry in Cambodia
Avian influenza viruses (AIVs) periodically cross species barriers and infect humans. The likelihood that an AIV will evolve mammalian transmissibility depends on acquiring and selecting mutations during spillover, but data from natural infection is limited. We analyze deep sequencing data from infected humans and domestic ducks in Cambodia to examine how H5N1 viruses evolve during spillover. Overall, viral populations in both species are predominated by low-frequency (5% frequency within-host. However, short infection times, genetic drift, and purifying selection likely restrict their ability to evolve extensively during a single infection. Applying evolutionary methods to sequence data, we reveal a detailed view of H5N1 virus adaptive potential, and develop a foundation for studying host-adaptation in other zoonotic viruses
Ferrets as models for influenza virus transmission studies and pandemic risk assessments
The ferret transmission model is extensively used to assess the pandemic potential of emerging influenza viruses, yet experimental conditions and reported results vary among laboratories. Such variation can be a critical consideration when contextualizing results from independent risk-assessment studies of novel and emerging influenza viruses. To streamline interpretation of data generated in different laboratories, we provide a consensus on experimental parameters that define risk-assessment experiments of influenza virus transmissibility, including disclosure of variables known or suspected to contribute to experimental variability in this model, and advocate adoption of more standardized practices. We also discuss current limitations of the ferret transmission model and highlight continued refinements and advances to this model ongoing in laboratories. Understanding, disclosing, and standardizing the critical parameters of ferret transmission studies will improve the comparability and reproducibility of pandemic influenza risk assessment and increase the statistical power and, perhaps, accuracy of this model
Robust expansion of phylogeny for fast-growing genome sequence data.
Massive sequencing of SARS-CoV-2 genomes has urged novel methods that employ existing phylogenies to add new samples efficiently instead of de novo inference. 'TIPars' was developed for such challenge integrating parsimony analysis with pre-computed ancestral sequences. It took about 21 seconds to insert 100 SARS-CoV-2 genomes into a 100k-taxa reference tree using 1.4 gigabytes. Benchmarking on four datasets, TIPars achieved the highest accuracy for phylogenies of moderately similar sequences. For highly similar and divergent scenarios, fully parsimony-based and likelihood-based phylogenetic placement methods performed the best respectively while TIPars was the second best. TIPars accomplished efficient and accurate expansion of phylogenies of both similar and divergent sequences, which would have broad biological applications beyond SARS-CoV-2. TIPars is accessible from https://tipars.hku.hk/ and source codes are available at https://github.com/id-bioinfo/TIPars
Multiple sequences insertion performance.
(A) Violin graphs showing the distribution of paired differences of the RF distances between the resulting trees generated by TIPars and UShER (TIPars—UShER) for the random 100 and 1000 multiple sequences insertions. (B) Violin graphs showing the distribution of paired differences of the Gamma20 log-likelihoods between the resulting trees generated by TIPars and UShER (TIPars—UShER) for the random 100 and 1000 multiple sequences insertions. (C) Violin graphs showing the distribution of paired differences of the RF distances between the resulting trees generated by TIPars and UShER (TIPars—UShER) for the successive 100 and 1000 multiple sequences insertions. (D) Violin graphs showing the distribution of paired differences of the Gamma20 log-likelihoods between the resulting trees generated by TIPars and UShER (TIPars—UShER) for the successive 100 and 1000 multiple sequences insertions. (E) Distribution of paired differences in the RF distances between the resulting trees generated by TIPars and UShER (TIPars—UShER) on 16S, H3N2 and NDV random 50 multiple sequences insertions. (F) Distribution of the paired differences in the Gamma20 log-likelihoods between the resulting trees generated by TIPars and the four other programs (TIPars—Others) on 16S random 50 multiple sequences insertions. (G) Distribution of the paired differences in the Gamma20 log-likelihoods between the resulting trees generated by TIPars and the four other programs (TIPars—Others) on H3N2 random 50 multiple sequences insertions. (H) Distribution of the paired differences in the Gamma20 log-likelihoods between the resulting trees generated by TIPars and the four other programs (TIPars—Others) on NDV random 50 multiple sequences insertions. P-values for the right-sided paired t-tests are indicated by the asterisk on top of each violin diagram, where p<0.05 is indicated by one pink asterisk (*), p<0.01 by two orange asterisks (**) and p<0.001 by three red asterisks (***).</p
Performance for (re)placement of a single taxon.
All taxa were removed individually and used for the placement test for TIPars, UShER, EPA-ng and APPLES-2 in 16S, H3N2 and NDV datasets. Note that the ancestral sequences of TIPars and mutation-annotated tree of UShER have not been reconstructed for each leave-one-out test due to the high computational requirement that would cause a bias for their accuracies. RAPPAS was excluded because of its large computation for the ‘pkDB’ database. (A) Bars represent the placement accuracy on 16S, H3N2 and NDV datasets. The highest accuracy in each dataset is highlighted in red. (B) Bar charts representing the mean RF distance calculated from the single taxon placement results on 16S, H3N2 and NDV datasets. The lowest mean RF distance in each dataset is highlighted in red. Panel A and B share the same figure legend in B. (C) Stacked bar charts showing the proportions of single and multiple placement results on 16S, H3N2 and NDV datasets. Proportions with > 0.1% are indicated within the bars. (TIF)</p
Number of sequences for BA related PANGO Lineages in the newest SARS-CoV-2 sequences from January 1 to June 4, 2022.
Number of sequences for BA related PANGO Lineages in the newest SARS-CoV-2 sequences from January 1 to June 4, 2022.</p
Runtime and memory usage for 16S, H3N2 and NDV dataset preparation.
Tests were run on a server with 32 Intel Xeon Gold 6242 CPU cores. (XLSX)</p
Runtime and memory usage for multiple taxa insertion to the 100k-taxa reference tree.
Tests were run on a server with 32 Intel Xeon Gold 6242 CPU cores for 10 repeated runs. TIPars took less than one day to insert 200k SARS-CoV-2 genome sequences into the 100k-taxa reference tree. (XLSX)</p
Run time and memory usage to optimize the branch lengths of a SARS-CoV-2 100k taxa tree.
Tests were run 100 times using FastTree2 (double-precision version) on a server with eight Intel Xeon Gold 6242R CPU cores. (XLSX)</p