30 research outputs found

    RECOVIR Software for Identifying Viruses

    Get PDF
    Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens

    Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in Drosophila melanogaster

    Get PDF
    Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP–based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms

    Epistasis dominates the genetic architecture of Drosophila quantitative traits

    Get PDF
    Epistasis-nonlinear genetic interactions between polymorphic loci-is the genetic basis of canalization and speciation, and epistatic interactions can be used to infer genetic networks affecting quantitative traits. However, the role that epistasis plays in the genetic architecture of quantitative traits is controversial. Here, we compared the genetic architecture of three Drosophila life history traits in the sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) and a large outbred, advanced intercross population derived from 40 DGRP lines (Flyland). We assessed allele frequency changes between pools of individuals at the extremes of the distribution for each trait in the Flyland population by deep DNA sequencing. The genetic architecture of all traits was highly polygenic in both analyses. Surprisingly, none of the SNPs associated with the traits in Flyland replicated in the DGRP and vice versa. However, the majority of these SNPs participated in at least one epistatic interaction in the DGRP. Despite apparent additive effects at largely distinct loci in the two populations, the epistatic interactions perturbed common, biologically plausible, and highly connected genetic networks. Our analysis underscores the importance of epistasis as a principal factor that determines variation for quantitative traits and provides a means to uncover genetic networks affecting these traits. Knowledge of epistatic networks will contribute to our understanding of the genetic basis of evolutionarily and clinically important traits and enhance predictive ability at an individualized level in medicine and agricultur

    Finding the missing honey bee genes: lessons learned from a genome upgrade

    Get PDF
    BACKGROUND: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. RESULTS: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. CONCLUSIONS: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.Funding for the project was provided by a grant to RG from the National Human Genome Research Institute, National Institutes of Health (NHGRI, NIH) U54 HG003273. Contributions from members of the CGE lab were supported by Agriculture and Food Research Initiative Competitive grant no. 2010- 65205-20407 from the USDA National Institute of Food Agriculture. AKB was supported by a Clare Luce Booth Fellowship at Georgetown University

    RECOVIR: An application package to automatically identify some single stranded RNA viruses using capsid protein residues that uniquely distinguish among these viruses

    No full text
    Abstract Background Most single stranded RNA (ssRNA) viruses mutate rapidly to generate large number of strains having highly divergent capsid sequences. Accurate strain recognition in uncharacterized target capsid sequences is essential for epidemiology, diagnostics, and vaccine development. Strain recognition based on similarity scores between target sequences and sequences of homology matched reference strains is often time consuming and ambiguous. This is especially true if only partial target sequences are available or if different ssRNA virus families are jointly analyzed. In such cases, knowledge of residues that uniquely distinguish among known reference strains is critical for rapid and unambiguous strain identification. Conventional sequence comparisons are unable to identify such capsid residues due to high sequence divergence among the ssRNA virus reference strains. Consequently, automated general methods to reliably identify strains using strain distinguishing residues are not currently available. Results We present here RECOVIR ("recognize viruses"), a software tool to automatically detect strains of caliciviruses and picornaviruses by comparing their capsid residues with built-in databases of residues that uniquely distinguish among known reference strains of these viruses. The databases were created by constructing partitioned phylogenetic trees of complete capsid sequences of these viruses. Strains were correctly identified for more than 300 complete and partial target sequences by comparing the database residues with the aligned residues of these sequences. It required about 5 seconds of real time to process each sequence. A Java-based user interface coupled with Perl-coded computational modules ensures high portability of the software. RECOVIR currently runs on Windows XP and Linux platforms. The software generalizes a manual method briefly outlined earlier for human caliciviruses. Conclusion This study shows implementation of an automated method to identify virus strains using databases of capsid residues. The method is implemented to detect strains of caliciviruses and picornaviruses, two of the most highly divergent ssRNA virus families, and therefore, especially difficult to identify using a uniform method. It is feasible to incorporate the approach into classification schemes of caliciviruses and picornaviruses and to extend the approach to recognize and classify other ssRNA virus families.</p
    corecore