Search CORE

5,433 research outputs found

Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage

Author: Antoneli Fernando
Janini Luiz Mario R.
Nascimento-Brito Sieberth do
Oliveira Guilherme
Volpini Angela C.
Zukurov Jean P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/04/2015
Field of study

In this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some NGS platforms in order to estimate a population of haplotypes. Our goal here is to take advantage of distinct virtues of a certain kind of NGS platform - the platform SOLiD (Life Technologies) is an example - that has not received much attention due to the short length of its reads, which renders haplotype estimation very difficult. However, this kind of platform has a very low error rate and extremely deep coverage per site and our method is designed to take advantage of these characteristics. We propose to measure the populational genetic diversity through a family of multinomial probability distributions indexed by the sites of the virus genome, each one representing the populational distribution of the diversity per site. The implementation of the method focuses on two main optimization strategies: a read mapping/alignment procedure that aims at the recovery of the maximum possible number of short-reads; the estimation of the multinomial parameters through a Bayesian approach, which, unlike simple frequency counting, allows one to take into account the prior information of the control population within the inference of a posterior experimental condition and provides a natural way to separate signal from noise, since it automatically furnishes Bayesian confidence intervals. The methods described in this paper have been implemented as an integrated tool called Tanden (Tool for Analysis of Diversity in Viral Populations).Comment: 30 pages, 5 figures, 2 tables, Tanden is written in C# (Microsoft), runs on the Windows operating system, and can be downloaded from: http://tanden.url.p

arXiv.org e-Print Archive

Springer - Publisher Connector

Repositório Institucional UNIFESP

PubMed Central

Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage

Author: A Bruselles
A Huang
A Narayanan
A O’Hagan
A Töpfer
AJ Nederbragt
AMN Tsibris
Angela C. Volpini
AR Macalalad
B Efron
B Efron
B Langmead
B Verbist
C Wang
CK Okoro
DC Koboldt
DI Lou
E Domingo
F Pukelsheim
FD Giallonardo
Fernando Antoneli
G Ronning
Guilherme C. Oliveira
H Li
H Li
H Li
HE Peckham
I Kinde
IM Wallace
J Shendure
Jean P. Zukurov
K Pearson
KW Ng
LD Brown
LM Mansky
Luiz Mario R. Janini
M Lataillade
MCF Prosperi
N Beerenwinkel
N Beerenwinkel
N Eriksson
N Homer
N Wicker
NCT Schopman
O Zagordi
P Flicek
Q Fu
R Goya
RJ Roberts
S Bao
S Duffy
S Kumar
S Mangul
S Nascimento-Brito
SF Altschul
SH Eshleman
Sieberth do Nascimento-Brito
SM Willerth
W Fischer
Wan-Ping Lee MPS
WHH Press
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Viral population estimation using pyrosequencing

Author: A Dempster
A Rambaut
AMN Tsibris
B Gaschen
Baback Gharizadeh
C Wang
Chunlin Wang
D O'Meara
DC Douek
E Domingo
E Halperin
EH Simpson
ES Lander
Glenn Tesler
GS Gottlieb
GW Tyson
H Fakhrai-Rad
I Malet
IM Rouzine
J Kececioglu
JE Hopcroft
JF Simons
K Chen
KJ Metzner
L Bacheler
L Doukhan
L Excoffier
Lior Pachter
LR Ford
M Breitbart
M Eigen
M Margulies
M Stephens
MA Nowak
MJ Gonzales
ML Collins
ML Sogin
Mostafa Ronaghi
MT Tammi
N Beerenwinkel
Nicholas Eriksson
Niko Beerenwinkel
P Jenkins
PA Pevzner
R Schmid
R Shankarappa
Robert W. Shafer
RP Dilworth
S Huse
S-Y Rhee
S-Y Rhee
Soo-Yon Rhee
VA Johnson
Yumi Mitsuya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Methods for Viral Intra-Host and Inter-Host Data Analysis for Next-Generation Sequencing Technologies

Author: Knyazev Sergey
Publication venue: ScholarWorks @ Georgia State University
Publication date: 10/08/2021
Field of study

The deep coverage offered by next-generation sequencing (NGS) technology has facilitated the reconstruction of intra-host RNA viral populations at an unprecedented level of detail. However, NGS data requires sophisticated analysis dealing with millions of error-prone short reads. This dissertation will first review the challenges and methods for viral NGS genomic data analysis in the NGS era. Second, it presents a software tool CliqueSNV for inferring viral quasispecies based on extracting pairs of statistically linked mutations from noisy reads, which effectively reduces sequencing noise and enables identifying minority haplotypes with a frequency below the sequencing error rate. Finally, the dissertation describes algorithms VOICE and MinDistB for inference of relatedness between viral samples, identification of transmission clusters, and sources of infection

ScholarWorks @ Georgia State University

Accurate Viral Population Assembly From Ultra-Deep Sequencing Data

Author: Eskin Eleazar
Mancuso Nicholas
Mangul Serghei
Sun Ren
Wu Nicholas C.
Zelikovskiy Alexander
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/06/2014
Field of study

Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation–maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads

ScholarWorks @ Georgia State University

PubMed Central

eScholarship - University of California

Recent advances in inferring viral diversity from high-throughput sequencing data

Author: Beerenwinkel Niko
Posada-Cespedes Susana
Seifert David
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

Repository for Publications and Research Data

Elsevier - Publisher Connector

Deep sequencing of virus derived small interfering RNAs and RNA from viral particles shows highly similar mutational landscape of a plant virus population.

Author: Curk T.
Gutiérrez Aguirre I.
Kreuze J.F.
Kutnjak D.
Ravnikar M.
Rupar M.
Publication venue: 'American Society for Microbiology'
Publication date: 11/02/2015
Field of study

RNA viruses exist within a host as a population of mutant sequences, often referred to as quasispecies. Within a host, sequences of RNA viruses constitute several distinct but interconnected pools, such as RNA packed in viral particles, double-stranded RNA, and virus-derived small interfering RNAs. We aimed to test if the same representation of within-host viral population structure could be obtained by sequencing different viral sequence pools. Using ultradeep Illumina sequencing, the diversity of two coexisting Potato virus Y sequence pools present within a plant was investigated: RNA isolated from viral particles and virus-derived small interfering RNAs (the derivatives of a plant RNA silencing mechanism). The mutational landscape of the within-host virus population was highly similar between both pools, with no notable hotspots across the viral genome. Notably, all of the single-nucleotide polymorphisms with a frequency of higher than 1.6% were found in both pools. Some unique single-nucleotide polymorphisms (SNPs) with very low frequencies were found in each of the pools, with more of them occurring in the small RNA (sRNA) pool, possibly arising through genetic drift in localized virus populations within a plant and the errors introduced during the amplification of silencing signal. Sequencing of the viral particle pool enhanced the efficiency of consensus viral genome sequence reconstruction. Nonhomologous recombinations were commonly detected in the viral particle pool, with a hot spot in the 3′ untranslated and coat protein regions of the genome. We stress that they present an important but often overlooked aspect of virus population diversity. IMPORTANCE This study is the most comprehensive whole-genome characterization of a within-plant virus population to date and the first study comparing diversity of different pools of viral sequences within a host. We show that both virus-derived small RNAs and RNA from viral particles could be used for diversity assessment of within-plant virus population, since they show a highly congruent portrayal of the virus mutational landscape within a plant. The study is an important baseline for future studies of virus population dynamics, for example, during the adaptation to a new host. The comparison of the two virus sequence enrichment techniques, sequencing of virus-derived small interfering RNAs and RNA from purified viral particles, shows the strength of the latter for the detection of recombinant viral genomes and reconstruction of complete consensus viral genome sequence

PubMed Central

CGSpace