Search CORE

22 research outputs found

Inferring Genomic Sequences

Author: Astrovskaya Irina A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2011
Field of study

Recent advances in next generation sequencing have provided unprecedented opportunities for high-throughput genomic research, inexpensively producing millions of genomic sequences in a single run. Analysis of massive volumes of data results in a more accurate picture of the genome complexity and requires adequate bioinformatics support. We explore computational challenges of applying next generation sequencing to particular applications, focusing on the problem of reconstructing viral quasispecies spectrum from pyrosequencing shotgun reads and problem of inferring informative single nucleotide polymorphisms (SNPs), statistically covering genetic variation of a genome region in genome-wide association studies. The genomic diversity of viral quasispecies is a subject of a great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software cannot be used to simultaneously assemble and estimate the abundance of multiple closely related (but non-identical) quasispecies sequences. Here, we introduce a new Viral Spectrum Assembler (ViSpA) for inferring quasispecies spectrum and compare it with the state-of-the-art ShoRAH tool on both synthetic and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. While ShoRAH has an advanced error correction algorithm, ViSpA is better at quasispecies assembling, producing more accurate reconstruction of a viral population. We also foresee ViSpA application to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations. Due to the large data volume in genome-wide association studies, it is desirable to find a small subset of SNPs (tags) that covers the genetic variation of the entire set. We explore the trade-off between the number of tags used per non-tagged SNP and possible overfitting and propose an efficient 2LR-Tagging heuristic

CiteSeerX

ScholarWorks @ Georgia State University

Viral diversity in children with diarrhea in Gambia

Author: Bo Liu
Irina Astrovskaya
Mihai Pop
Publication venue: Springer Nature
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

PubMed Central

Individual-specific changes in the human gut microbiota after challenge with enterotoxigenic Escherichia coli and subsequent ciprofloxacin treatment

Author: Astrovskaya Irina
Chakraborty Subhra
Corrada Bravo Héctor
Harro Clayton
Li Shan
Lindsay Brianna R.
Parkhill Julian
Paulson Joseph N.
Pop Mihai
Sack David A.
Stine O. Colin
Walker Alan W.
Walker Richard I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/06/2016
Field of study

Acknowledgements The authors wish to thank Mark Stares, Richard Rance, and other members of the Wellcome Trust Sanger Institute’s 454 sequencing team for generating the 16S rRNA gene data. Lili Fox Vélez provided editorial support. Funding IA, JNP, and MP were partly supported by the NIH, grants R01-AI-100947 to MP, and R21-GM-107683 to Matthias Chung, subcontract to MP. JNP was partly supported by an NSF graduate fellowship number DGE750616. IA, JNP, BRL, OCS and MP were supported in part by the Bill and Melinda Gates Foundation, award number 42917 to OCS. JP and AWW received core funding support from The Wellcome Trust (grant number 098051). AWW, and the Rowett Institute of Nutrition and Health, University of Aberdeen, receive core funding support from the Scottish Government Rural and Environmental Science and Analysis Service (RESAS).Peer reviewedPublisher PD

Aberdeen University Research

Crossref

PubMed Central

FigShare

De novo likelihood-based measures for comparing genome assemblies

Author: Astrovskaya Irina
Ghodsi Mohammadreza
Hill Christopher M
Koren Sergey
Lin Henry
Pop Mihai
Sommer Dan D
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. These “gold standards” can be expensive to produce and may only cover a small fraction of the genome, which limits their applicability to newly generated genome sequences. Here we introduce a de novo probabilistic measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. We demonstrate that our de novo score can be computed quickly and accurately in a practical setting even for large datasets, by estimating the score from a relatively small sample of the reads. To demonstrate the benefits of our score, we measure the quality of the assemblies generated in the GAGE and Assemblathon 1 assembly “bake-offs” with our metric. Even without knowledge of the true reference sequence, our de novo metric closely matches the reference-based evaluation metrics used in the studies and outperforms other de novo metrics traditionally used to measure assembly quality (such as N50). Finally, we highlight the application of our score to optimize assembly parameters used in genome assemblers, which enables better assemblies to be produced, even without prior knowledge of the genome being assembled. Likelihood-based measures, such as ours proposed here, will become the new standard for de novo assembly evaluation.https://doi.org/10.1186/1756-0500-6-33

Crossref

Columbia University Academic Commons

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition

Author: Adeyemi Mitchell
Ahmed Dilruba
Ahmed Firoz
Alam Meer
Amin Ruhul
Antonio Martin
Astrovskaya Irina
Bravo Hector
Breiman Robert F
Ebruke Chinelo
Hossain M Anowar
Ikumapayi Usman N
Juma Jane
Kotloff Karen
Levine Myron M
Lindsay Brianna
Mai Volker
Mailu Euince
Morris J Glenn
Nataro James P
Ochieng John B
Omore Richard
Ouma Emmanuel
Oundo Joseph
Panchalingam Sandra
Parkhill Julian
Paulson Joseph
Pop Mihai
Rance Richard
Saha Debasish
Siddiqui Sabbir
Stares Mark
Stine O Colin
Tamboura Boubou
Walker Alan W
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Acknowledgments This work was funded in part by the William and Melinda Gates Foundation, award 42917 to JPN and OCS; US National Institutes of Health grants 5R01HG005220 to HCB, 5R01HG004885 to MP; US National Science Foundation Graduate Research Fellowship award DGE0750616 to JNP; AWW and JP are funded by The Wellcome Trust (Grant No. WT098051).Peer reviewedPublisher PD

Aberdeen University Research

Crossref

LSHTM Research Online

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Macquarie University ResearchOnline

Original Article

Author: Alan Walker (3508322)
Brianna Lindsay (3508319)
Clayton Harro (2552539)
David Sack (148746)
HĂŠctor Bravo (3413552)
Irina Astrovskaya (3508328)
Joseph Paulson (3495965)
Julian Parkhill (30393)
Mihai Pop (48593)
O. Stine (3508325)
Richard Walker (3126456)
Shan Li (110392)
Subhra Chakraborty (464461)
Publication venue: 千葉医学会
Publication date
Field of study

The employment of malaria therapy for neurosyphilis has been decreasing since penicillin and other antibiotics appeared and neurosyphilis patients decreased recently in their number. But malaria therapy is one of the most effective therapies for neurosyphilis still now. So we must find out how to keep alive malaria blood not in vivo, simply. The results were: 1) The temperature in which malaria blood was kept, decided its fate. The preservation under 4℃, -20℃ was not suitable to keep alive malaria blood long. 2) The solution in a ratio of 4 parts of malaria blood to I part of ACD solution (anti-coagulant) was added by 1.2 to 2.5 mol. amounts of glycerin and then freezing it rapidly at a temperature of -79℃, quick thawing and injecting it intramuscularly among 65 subjects, infection was accomplished sufficiently in 54 subjects with no malaria history. The storage period was 3-242 days. Its incubation period was 12 to 28 days and the average 14.6 days. At present, the longest preservation period is 242 days. In case of slight prolongation of incubation subsequent to long preservation and the parasites figures of smears of Giemsa method, there is possibility of longer preservation than 242 days which is the longest period at this time. This method is simple, practical for malaria preservation. In this case, the factors to determine whether the blood was effectable or not effectable concerned the numbers of parasites in the blood before frozen. 3) Although the freezing drying method did not succeed this time, its possibility can be expected by observing the reconstruction of malaria parasites in glycerin using example. 4) As author described above, glycerin acts effectively on frozen-keeping of malaria protozoa, too

FigShare

Inferring viral quasispecies spectra from 454 pyrosequencing reads

Author: A Sundquist
Alex Zelikovsky
AR Quinlan
B Gaschen
Bassam Tork
D Brinza
DC Douek
E Domingo
E Martinez-Salas
EA Duarte
G Myers
H Fakhrai-Rad
Ion Măndoiu
Irina Astrovskaya
JC de la Torre
JC Venter
JI Esteban
JJ Holland
JW Drake
K Westbrooks
Kelly Westbrooks
M Eigen
M Margulies
MC Prosperi
MJ Chaisson
N Beerenwinkel
N Eriksson
NM Laird
O Zagordi
O Zagordi
Peter Balfe
R Lippert
S Balser
S Hoffmann
S-Y Rhee
Serghei Mangul
SL Fishman
ST O’Neil
T von Hahn
V Bansal
W Brockman
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences. Results In this paper, we introduce a new Viral Spectrum Assembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at <url>http://alla.cs.gsu.edu/~software/VISPA/vispa.html</url>. Conclusions ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p

Crossref

ScholarWorks @ Georgia State University

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central