Search CORE

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Inferring viral quasispecies spectra from 454 pyrosequencing reads

Author: A Sundquist
Alex Zelikovsky
AR Quinlan
B Gaschen
Bassam Tork
D Brinza
DC Douek
E Domingo
E Martinez-Salas
EA Duarte
G Myers
H Fakhrai-Rad
Ion Măndoiu
Irina Astrovskaya
JC de la Torre
JC Venter
JI Esteban
JJ Holland
JW Drake
K Westbrooks
Kelly Westbrooks
M Eigen
M Margulies
MC Prosperi
MJ Chaisson
N Beerenwinkel
N Eriksson
NM Laird
O Zagordi
O Zagordi
Peter Balfe
R Lippert
S Balser
S Hoffmann
S-Y Rhee
Serghei Mangul
SL Fishman
ST O’Neil
T von Hahn
V Bansal
W Brockman
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences. Results In this paper, we introduce a new Viral Spectrum Assembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at <url>http://alla.cs.gsu.edu/~software/VISPA/vispa.html</url>. Conclusions ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p

ScholarWorks @ Georgia State University

Directory of Open Access Journals

A cost-effective and universal strategy for complete prokaryotic genomic sequencing proposed by computer simulation

Author: Au Chun Hang
Jiang Jingwei
Kam Kai Man
Kwan Hoi Shan
Leung Frederick C
Li Jun
Li Lei
Lun Ling Julia Mei
Wan Law Patrick Tik
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background: Pyrosequencing techniques allow scientists to perform prokaryotic genome sequencing to achieve the draft genomic sequences within a few days. However, the assemblies with shotgun sequencing are usually composed of hundreds of contigs. A further multiplex PCR procedure is needed to fill all the gaps and link contigs into complete chromosomal sequence, which is the basis for prokaryotic comparative genomic studies. In this article, we study various pyrosequencing strategies by simulated assembling from 100 prokaryotic genomes. Findings. Simulation study shows that a single end 454 Jr. run combined with a paired end 454 Jr. run (8 kb library) can produce: 1) ∼90% of 100 assemblies with 99.99%; 4) average false gene duplication rate is < 0.7%; 5) average false gene loss rate is < 0.4%. Conclusions: A single end 454 Jr. run combined with a paired end 454 Jr. run (8 kb library) is a cost-effective way for prokaryotic whole genome sequencing. This strategy provides solution to produce high quality draft assemblies for most of prokaryotic organisms within days. Due to the small number of assembled scaffolds, the following multiplex PCR procedure (for gap filling) would be easy. As a result, large scale prokaryotic whole genome sequencing projects may be finished within weeks. © 2012 Jiang et al; BioMed Central Ltd.published_or_final_versio

HKU Scholars Hub

GemSIM: general, error-model based simulator of next-generation sequencing data

Author: Luciani Fabio
McElroy Kerensa E
Thomas Torsten
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Institutional Repository of the Freie Universität Berlin

Mason – A Read Simulator for Second Generation Sequencing Data

Author: Holtgrewe M.
Publication venue
Publication date: 01/01/2010
Field of study

We present a read simulator software for Illumina, 454 and Sanger reads. Its features include position specific error rates and base quality values. For Illumina reads, we give a comprehensive analysis with empirical data for the error and quality model. For the other technologies, we use models from the literature. It has been written with performance in mind and can sample reads from large genomes. The C++ source code is extensible, and freely available under the GPL/LGPL

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Grinder: a versatile amplicon and shotgun sequence simulator

Author: Afgan
Angly
Angly
Balzer
Balzer
Bennasar
Berger
Beszteri
Caporaso
Crosby
Dana Willner
DeSantis
Edgar
Engle
Engle
Florent E. Angly
Forest Rohwer
Gene W. Tyson
Ghannoum
Giardine
Gomez-Alvarez
Haas
Henn
Holtgrewe
Hur
Huse
Kan
Korbel
Kunin
Lank
L’Ecuyer
Margulies
Matsumoto
Mavromatis
Myers
Ochman
Philip Hugenholtz
Pinard
Pruesse
Pruitt
Quince
Quince
R development core team. R
Richter
Rothberg
Simons
Sogin
Stajich
Sun
Treangen
Tringe
Turnbaugh
Ulrich
Vivancos
Wang
Willner
Yilmaz
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

We introduce Grinder (http://sourceforge.net/ projects/biogrinder/), an open-source bioinformatic tool to simulate amplicon and shotgun (genomic, metagenomic, transcriptomic and metatranscriptomic) datasets from reference sequences. This is the first tool to simulate amplicon datasets (e.g. 16S rRNA) widely used by microbial ecologists. Grinder can create sequence libraries with a specific community structure, α and β diversities and experimental biases (e.g. chimeras, gene copy number variation) for commonly used sequencing platforms. This versatility allows the creation of simple to complex read datasets necessary for hypothesis testing when developing bioinformatic software, benchmarking existing tools or designing sequence-based experiments. Grinder is particularly useful for simulating clinical or environmental microbial communities and complements the use of in vitro mock communities

Queensland University of Technology ePrints Archive

University of Queensland eSpace

FAAST: Flow-space Assisted Alignment Search Tool

Author: Fredrik Lysholm
Björn Andersson
Bengt Persson
M Margulies
M Droege
SB Needleman
TF Smith
O Gotoh
DJ Lipman
WR Pearson
SF Altschul
SF Altschul
MO Dayhoff
V Vacic
R Kofler
S Balzer
J Jerlström-Hultqvist
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Publikationer från Linköpings universitet

Directory of Open Access Journals