4 research outputs found

    Inferring viral quasispecies spectra from 454 pyrosequencing reads

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences.</p> <p>Results</p> <p>In this paper, we introduce a new <b>Vi</b>ral <b>Sp</b>ectrum <b>A</b>ssembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at <url>http://alla.cs.gsu.edu/~software/VISPA/vispa.html</url>.</p> <p>Conclusions</p> <p>ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p

    Haplotype and minimum-chimerism consensus determination using short sequence data

    Full text link

    Investigating mechanisms of genome expansion in Corydoradinae catfish

    Get PDF
    The Corydoradinae catfish are a diverse sub-family of neo-tropical catfishes (order Siluriformes) with more than 170 species described to date. One of the most compelling features of this sub-family is the enormous amount of variation in genome size. With species containing between 0.5 pg and 4.8pg of DNA, variation is comparable to that found across the Teleostei as a whole. Previous phylogenetic analysis identified nine distinct lineages within the Corydoradinae, with more basal lineages possessing smaller genomes and with largest genome sizes found in the most derived lineages. To date, nothing is known about the mechanism that drove this genome expansion in the Corydoradinae, though Whole Genome Duplication (WGD) events have been suggested. Here, the incidence of WGD events has been investigated using a Hox gene and a restriction site associated DNA (RAD) sequencing data set. Both data sets identified a major duplication event at the base of the group, with additional duplication events occurring across the family. These duplication events were shown to have led to relaxed purifying selection and increased functional divergence of HoxA13a copies in the Corydoradinae compared with teleosts that have not undergone additional rounds of WGD. The RAD data set confirmed significant genome-wide shifts in duplicate, multi-haplotype regions across the Corydoradinae, and indicates that several species from lineages 6-9 are functionally polyploid, whereas species that underwent earlier WGDs have largely diploidized and are likely paleopolyploids. An increase in paralogous genes was noted, with Gene Ontology suggesting that gene retention in the Corydoradinae mirrors previously described retention in Tetraodon following the fish-specific genome duplication in the Teleostei. Intriguingly, the RAD data also identified a significant expansion of Transposable Elements (TEs), driven by a DNA TE superfamily (Tc1-Mariner). This expansion significantly contributed to the genome size variation, though to a lesser degree than the WGD events identified within this thesis
    corecore