87 research outputs found

    Viral population estimation using pyrosequencing

    Get PDF
    The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

    Comprehensive Primer Design for Analysis of Population Genetics in Non-Sequenced Organisms

    Get PDF
    Nuclear sequence markers are useful tool for the study of the history of populations and adaptation. However, it is not easy to obtain multiple nuclear primers for organisms with poor or no genomic sequence information. Here we used the genomes of organisms that have been fully sequenced to design comprehensive sets of primers to amplify polymorphic genomic fragments of multiple nuclear genes in non-sequenced organisms. First, we identified a large number of candidate polymorphic regions that were flanked on each side by conserved regions in the reference genomes. We then designed primers based on these conserved sequences and examined whether the primers could be used to amplify sequences in target species, montane brown frog (Rana ornativentris), anole lizard (Anolis sagrei), guppy (Poecilia reticulata), and fruit fly (Drosophila melanogaster), for population genetic analysis. We successfully obtained polymorphic markers for all target species studied. In addition, we found that sequence identities of the regions between the primer sites in the reference genomes affected the experimental success of DNA amplification and identification of polymorphic loci in the target genomes, and that exonic primers had a higher success rate than intronic primers in amplifying readable sequences. We conclude that this comparative genomic approach is a time- and cost-effective way to obtain polymorphic markers for non-sequenced organisms, and that it will contribute to the further development of evolutionary ecology and population genetics for non-sequenced organisms, aiding in the understanding of the genetic basis of adaptation

    Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3

    Get PDF
    Background: Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described. Results: Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates. Conclusion: We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution

    Genome-wide evolutionary dynamics of influenza B viruses on a global scale

    Get PDF
    The global-scale epidemiology and genome-wide evolutionary dynamics of influenza B remain poorly understood compared with influenza A viruses. We compiled a spatio-temporally comprehensive dataset of influenza B viruses, comprising over 2,500 genomes sampled worldwide between 1987 and 2015, including 382 newly-sequenced genomes that fill substantial gaps in previous molecular surveillance studies. Our contributed data increase the number of available influenza B virus genomes in Europe, Africa and Central Asia, improving the global context to study influenza B viruses. We reveal Yamagata-lineage diversity results from co-circulation of two antigenically-distinct groups that also segregate genetically across the entire genome, without evidence of intra-lineage reassortment. In contrast, Victoria-lineage diversity stems from geographic segregation of different genetic clades, with variability in the degree of geographic spread among clades. Differences between the lineages are reflected in their antigenic dynamics, as Yamagata-lineage viruses show alternating dominance between antigenic groups, while Victoria-lineage viruses show antigenic drift of a single lineage. Structural mapping of amino acid substitutions on trunk branches of influenza B gene phylogenies further supports these antigenic differences and highlights two potential mechanisms of adaptation for polymerase activity. Our study provides new insights into the epidemiological and molecular processes shaping influenza B virus evolution globally

    Positive Selection Results in Frequent Reversible Amino Acid Replacements in the G Protein Gene of Human Respiratory Syncytial Virus

    Get PDF
    Human respiratory syncytial virus (HRSV) is the major cause of lower respiratory tract infections in children under 5 years of age and the elderly, causing annual disease outbreaks during the fall and winter. Multiple lineages of the HRSVA and HRSVB serotypes co-circulate within a single outbreak and display a strongly temporal pattern of genetic variation, with a replacement of dominant genotypes occurring during consecutive years. In the present study we utilized phylogenetic methods to detect and map sites subject to adaptive evolution in the G protein of HRSVA and HRSVB. A total of 29 and 23 amino acid sites were found to be putatively positively selected in HRSVA and HRSVB, respectively. Several of these sites defined genotypes and lineages within genotypes in both groups, and correlated well with epitopes previously described in group A. Remarkably, 18 of these positively selected tended to revert in time to a previous codon state, producing a “flip-flop” phylogenetic pattern. Such frequent evolutionary reversals in HRSV are indicative of a combination of frequent positive selection, reflecting the changing immune status of the human population, and a limited repertoire of functionally viable amino acids at specific amino acid sites

    rMotifGen: random motif generator for DNA and protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms.</p> <p>Results</p> <p>Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages.</p> <p>Conclusion</p> <p>rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: <url>http://bioinformatics.louisville.edu/brg/rMotifGen/</url>.</p

    Phylogenetic and Biogeographic Analysis of Sphaerexochine Trilobites

    Get PDF
    BACKGROUND: Sphaerexochinae is a speciose and widely distributed group of cheirurid trilobites. Their temporal range extends from the earliest Ordovician through the Silurian, and they survived the end Ordovician mass extinction event (the second largest mass extinction in Earth history). Prior to this study, the individual evolutionary relationships within the group had yet to be determined utilizing rigorous phylogenetic methods. Understanding these evolutionary relationships is important for producing a stable classification of the group, and will be useful in elucidating the effects the end Ordovician mass extinction had on the evolutionary and biogeographic history of the group. METHODOLOGY/PRINCIPAL FINDINGS: Cladistic parsimony analysis of cheirurid trilobites assigned to the subfamily Sphaerexochinae was conducted to evaluate phylogenetic patterns and produce a hypothesis of relationship for the group. This study utilized the program TNT, and the analysis included thirty-one taxa and thirty-nine characters. The results of this analysis were then used in a Lieberman-modified Brooks Parsimony Analysis to analyze biogeographic patterns during the Ordovician-Silurian. CONCLUSIONS/SIGNIFICANCE: The genus Sphaerexochus was found to be monophyletic, consisting of two smaller clades (one composed entirely of Ordovician species and another composed of Silurian and Ordovician species). By contrast, the genus Kawina was found to be paraphyletic. It is a basal grade that also contains taxa formerly assigned to Cydonocephalus. Phylogenetic patterns suggest Sphaerexochinae is a relatively distinctive trilobite clade because it appears to have been largely unaffected by the end Ordovician mass extinction. Finally, the biogeographic analysis yields two major conclusions about Sphaerexochus biogeography: Bohemia and Avalonia were close enough during the Silurian to exchange taxa; and during the Ordovician there was dispersal between Eastern Laurentia and the Yangtze block (South China) and between Eastern Laurentia and Avalonia

    Additions to the Mycosphaerella complex

    Get PDF
    Species in the present study were compared based on their morphology, growth characteristics in culture, and DNA sequences of the nuclear ribosomal RNA gene operon (including ITS1, ITS2, 5.8S nrDNA and the first 900 bp of the 28S nrDNA) for all species and partial actin and translation elongation factor 1-alpha gene sequences for Cladosporium species. New species of Mycosphaerella (Mycosphaerellaceae) introduced in this study include M. cerastiicola (on Cerastium semidecandrum, The Netherlands), and M. etlingerae (on Etlingera elatior, Hawaii). Mycosphaerella holualoana is newly reported on Hedychium coronarium (Hawaii). Epitypes are also designated for Hendersonia persooniae, the basionym of Camarosporula persooniae, and for Sphaerella agapanthi, the basionym of Teratosphaeria agapanthi comb. nov. (Teratosphaeriaceae) on Agapathus umbellatus from South Africa. The latter pathogen is also newly recorded from A. umbellatus in Europe (Portugal). Furthermore, two sexual species of Cladosporium (Davidiellaceae) are described, namely C. grevilleae (on Grevillea sp., Australia), and C. silenes (on Silene maritima, UK). Finally, the phylogenetic position of two genera are newly confirmed, namely Camarosporula (based on C. persooniae, teleomorph Anthracostroma persooniae), which is a leaf pathogen of Persoonia spp. in Australia, belongs to the Teratosphaeriaceae, and Sphaerulina (based on S. myriadea), which occurs on leaves of Fagaceae (Carpinus, Castanopsis, Fagus, Quercus), and belongs to the Mycosphaerellaceae

    A new classification of the long-horned caddisflies (Trichoptera: Leptoceridae) based on molecular data

    Get PDF
    Background: Leptoceridae are among the three largest families of Trichoptera (caddisflies). The current classification is founded on a phylogenetic work from the 1980's, based on morphological characters from adult males, i.e. wing venation, tibial spur formula and genital morphology. In order to get a new opinion about the relationships within the family, we undertook a molecular study of the family based on sequences from five genes, mitochondrial COI and the four nuclear genes CAD, EF-1 alpha, IDH and POL. Results: The resulting phylogenetic hypotheses are more or less congruent with the morphologically based classification, with most genera and tribes recovered as monophyletic, but with some major differences. For monophyly of the two subfamilies Triplectidinae and Leptocerinae, one tribe of each was removed and elevated to subfamily status; however monophyly of some genera and tribes is in question. All clades except Leptocerinae, were stable across different analysis methods. Conclusions: We elevate the tribes Grumichellini and Leptorussini to subfamily status, Grumichellinae and Leptorussinae, respectively. We also propose the synonymies of Ptochoecetis with Oecetis and Condocerus with Hudsonema.authorCount :

    Higher Level Phylogeny and the First Divergence Time Estimation of Heteroptera (Insecta: Hemiptera) Based on Multiple Genes

    Get PDF
    Heteroptera, or true bugs, are the largest, morphologically diverse and economically important group of insects with incomplete metamorphosis. However, the phylogenetic relationships within Heteroptera are still in dispute and most of the previous studies were based on morphological characters or with single gene (partial or whole 18S rDNA). Besides, so far, divergence time estimates for Heteroptera totally rely on the fossil record, while no studies have been performed on molecular divergence rates. Here, for the first time, we used maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI) with multiple genes (18S rDNA, 28S rDNA, 16S rDNA and COI) to estimate phylogenetic relationships among the infraorders, and meanwhile, the Penalized Likelihood (r8s) and Bayesian (BEAST) molecular dating methods were employed to estimate divergence time of higher taxa of this suborder. Major results of the present study included: Nepomorpha was placed as the most basal clade in all six trees (MP trees, ML trees and Bayesian trees of nuclear gene data and four-gene combined data, respectively) with full support values. The sister-group relationship of Cimicomorpha and Pentatomomorpha was also strongly supported. Nepomorpha originated in early Triassic and the other six infraorders originated in a very short period of time in middle Triassic. Cimicomorpha and Pentatomomorpha underwent a radiation at family level in Cretaceous, paralleling the proliferation of the flowering plants. Our results indicated that the higher-group radiations within hemimetabolous Heteroptera were simultaneously with those of holometabolous Coleoptera and Diptera which took place in the Triassic. While the aquatic habitat was colonized by Nepomorpha already in the Triassic, the Gerromorpha independently adapted to the semi-aquatic habitat in the Early Jurassic
    corecore