Search CORE

133 research outputs found

Recent advances in inferring viral diversity from high-throughput sequencing data

Author: Beerenwinkel Niko
Posada-Cespedes Susana
Seifert David
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

Repository for Publications and Research Data

Elsevier - Publisher Connector

Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations

Author: Glebova Olga
Publication venue: ScholarWorks @ Georgia State University
Publication date: 15/12/2016
Field of study

Next-generation sequencing (NGS) and mass spectrometry technologies bring unprecedented throughput, scalability and speed, facilitating the studies of biological systems. These technologies allow to sequence and analyze heterogeneous RNA populations rather than single sequences. In particular, they provide the opportunity to implement massive viral surveillance and transcriptome quantification. However, in order to fully exploit the capabilities of NGS technology we need to develop computational methods able to analyze billions of reads for assembly and characterization of sampled RNA populations. In this work we present novel computational methods for cost- and time-effective analysis of sequencing data from viral and RNA samples. In particular, we describe: i) computational methods for transcriptome reconstruction and quantification; ii) method for mass spectrometry data analysis; iii) combinatorial pooling method; iv) computational methods for analysis of intra-host viral populations

ScholarWorks @ Georgia State University

Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

Author: Tork Bassam A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2013
Field of study

The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods

CiteSeerX

ScholarWorks @ Georgia State University

Inferring viral quasispecies spectra from 454 pyrosequencing reads

Author: A Sundquist
Alex Zelikovsky
AR Quinlan
B Gaschen
Bassam Tork
D Brinza
DC Douek
E Domingo
E Martinez-Salas
EA Duarte
G Myers
H Fakhrai-Rad
Ion Măndoiu
Irina Astrovskaya
JC de la Torre
JC Venter
JI Esteban
JJ Holland
JW Drake
K Westbrooks
Kelly Westbrooks
M Eigen
M Margulies
MC Prosperi
MJ Chaisson
N Beerenwinkel
N Eriksson
NM Laird
O Zagordi
O Zagordi
Peter Balfe
R Lippert
S Balser
S Hoffmann
S-Y Rhee
Serghei Mangul
SL Fishman
ST O’Neil
T von Hahn
V Bansal
W Brockman
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences. Results In this paper, we introduce a new Viral Spectrum Assembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at <url>http://alla.cs.gsu.edu/~software/VISPA/vispa.html</url>. Conclusions ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p

Crossref

ScholarWorks @ Georgia State University

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

De novo approaches to haplotype-aware genome assembly

Author: Baaijens J.A. (Jasmijn)
Publication venue
Publication date: 25/09/2019
Field of study

CWI's Institutional Repository

Algorithms for Viral Population Analysis

Author: Mancuso Nicholas
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/08/2014
Field of study

The genetic structure of an intra-host viral population has an effect on many clinically important phenotypic traits such as escape from vaccine induced immunity, virulence, and response to antiviral therapies. Next-generation sequencing provides read-coverage sufficient for genomic reconstruction of a heterogeneous, yet highly similar, viral population; and more specifically, for the detection of rare variants. Admittedly, while depth is less of an issue for modern sequencers, the short length of generated reads complicates viral population assembly. This task is worsened by the presence of both random and systematic sequencing errors in huge amounts of data. In this dissertation I present completed work for reconstructing a viral population given next-generation sequencing data. Several algorithms are described for solving this problem under the error-free amplicon (or sliding-window) model. In order for these methods to handle actual real-world data, an error-correction method is proposed. A formal derivation of its likelihood model along with optimization steps for an EM algorithm are presented. Although these methods perform well, they cannot take into account paired-end sequencing data. In order to address this, a new method is detailed that works under the error-free paired-end case along with maximum a-posteriori estimation of the model parameters

ScholarWorks @ Georgia State University

Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage

Author: A Bruselles
A Huang
A Narayanan
A O’Hagan
A Töpfer
AJ Nederbragt
AMN Tsibris
Angela C. Volpini
AR Macalalad
B Efron
B Efron
B Langmead
B Verbist
C Wang
CK Okoro
DC Koboldt
DI Lou
E Domingo
F Pukelsheim
FD Giallonardo
Fernando Antoneli
G Ronning
Guilherme C. Oliveira
H Li
H Li
H Li
HE Peckham
I Kinde
IM Wallace
J Shendure
Jean P. Zukurov
K Pearson
KW Ng
LD Brown
LM Mansky
Luiz Mario R. Janini
M Lataillade
MCF Prosperi
N Beerenwinkel
N Beerenwinkel
N Eriksson
N Homer
N Wicker
NCT Schopman
O Zagordi
P Flicek
Q Fu
R Goya
RJ Roberts
S Bao
S Duffy
S Kumar
S Mangul
S Nascimento-Brito
SF Altschul
SH Eshleman
Sieberth do Nascimento-Brito
SM Willerth
W Fischer
Wan-Ping Lee MPS
WHH Press
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Methods for Viral Intra-Host and Inter-Host Data Analysis for Next-Generation Sequencing Technologies

Author: Knyazev Sergey
Publication venue: ScholarWorks @ Georgia State University
Publication date: 10/08/2021
Field of study

The deep coverage offered by next-generation sequencing (NGS) technology has facilitated the reconstruction of intra-host RNA viral populations at an unprecedented level of detail. However, NGS data requires sophisticated analysis dealing with millions of error-prone short reads. This dissertation will first review the challenges and methods for viral NGS genomic data analysis in the NGS era. Second, it presents a software tool CliqueSNV for inferring viral quasispecies based on extracting pairs of statistically linked mutations from noisy reads, which effectively reduces sequencing noise and enables identifying minority haplotypes with a frequency below the sequencing error rate. Finally, the dissertation describes algorithms VOICE and MinDistB for inference of relatedness between viral samples, identification of transmission clusters, and sources of infection

ScholarWorks @ Georgia State University

Viral population estimation using pyrosequencing

Author: A Dempster
A Rambaut
AMN Tsibris
B Gaschen
Baback Gharizadeh
C Wang
Chunlin Wang
D O'Meara
DC Douek
E Domingo
E Halperin
EH Simpson
ES Lander
Glenn Tesler
GS Gottlieb
GW Tyson
H Fakhrai-Rad
I Malet
IM Rouzine
J Kececioglu
JE Hopcroft
JF Simons
K Chen
KJ Metzner
L Bacheler
L Doukhan
L Excoffier
Lior Pachter
LR Ford
M Breitbart
M Eigen
M Margulies
M Stephens
MA Nowak
MJ Gonzales
ML Collins
ML Sogin
Mostafa Ronaghi
MT Tammi
N Beerenwinkel
Nicholas Eriksson
Niko Beerenwinkel
P Jenkins
PA Pevzner
R Schmid
R Shankarappa
Robert W. Shafer
RP Dilworth
S Huse
S-Y Rhee
S-Y Rhee
Soo-Yon Rhee
VA Johnson
Yumi Mitsuya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors