6 research outputs found

    Virus variation resources at the National Center for Biotechnology Information: dengue virus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is an increasing number of complete and incomplete virus genome sequences available in public databases. This large body of sequence data harbors information about epidemiology, phylogeny, and virulence. Several specialized databases, such as the NCBI Influenza Virus Resource or the Los Alamos HIV database, offer sophisticated query interfaces along with integrated exploratory data analysis tools for individual virus species to facilitate extracting this information. Thus far, there has not been a comprehensive database for dengue virus, a significant public health threat.</p> <p>Results</p> <p>We have created an integrated web resource for dengue virus. The technology developed for the NCBI Influenza Virus Resource has been extended to process non-segmented dengue virus genomes. In order to allow efficient processing of the dengue genome, which is large in comparison with individual influenza segments, we developed an offline pre-alignment procedure which generates a multiple sequence alignment of all dengue sequences. The pre-calculated alignment is then used to rapidly create alignments of sequence subsets in response to user queries. This improvement in technology will also facilitate the incorporation of additional virus species in the future. The set of virus-specific databases at NCBI, which will be referred to as Virus Variation Resources (VVR), allow users to build complex queries against virus-specific databases and then apply exploratory data analysis tools to the results. The metadata is automatically collected where possible, and extended with data extracted from the literature.</p> <p>Conclusion</p> <p>The NCBI Dengue Virus Resource integrates dengue sequence information with relevant metadata (sample collection time and location, disease severity, serotype, sequenced genome region) and facilitates retrieval and preliminary analysis of dengue sequences using integrated web analysis and visualization tools.</p

    Tree pruner: An efficient tool for selecting data from a biased genetic database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large databases of genetic data are often biased in their representation. Thus, selection of genetic data with desired properties, such as evolutionary representation or shared genotypes, is problematic. Selection on the basis of epidemiological variables may not achieve the desired properties. Available automated approaches to the selection of influenza genetic data make a tradeoff between speed and simplicity on the one hand and control over quality and contents of the dataset on the other hand. A poorly chosen dataset may be detrimental to subsequent analyses.</p> <p>Results</p> <p>We developed a tool, <it>Tree Pruner</it>, for obtaining a dataset with desired evolutionary properties from a large, biased genetic database. Tree Pruner provides the user with an interactive phylogenetic tree as a means of editing the initial dataset from which the tree was inferred. The tree visualization changes dynamically, using colors and shading, reflecting Tree Pruner actions. At the end of a Tree Pruner session, the editing actions are implemented in the dataset.</p> <p>Currently, Tree Pruner is implemented on the Influenza Research Database (IRD). The data management capabilities of the IRD allow the user to store a pruned dataset for additional pruning or for subsequent analysis. Tree Pruner can be easily adapted for use with other organisms.</p> <p>Conclusions</p> <p>Tree Pruner is an efficient, manual tool for selecting a high-quality dataset with desired evolutionary properties from a biased database of genetic sequences. It offers an important alternative to automated approaches to the same goal, by providing the user with a dynamic, visual guide to the ongoing selection process and ultimate control over the contents (and therefore quality) of the dataset.</p

    PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Results of phylogenetic analysis are often visualized as phylogenetic trees. Such a tree can typically only include up to a few hundred sequences. When more than a few thousand sequences are to be included, analyzing the phylogenetic relationships among them becomes a challenging task. The recent frequent outbreaks of influenza A viruses have resulted in the rapid accumulation of corresponding genome sequences. Currently, there are more than 7500 influenza A virus genomes in the database. There are no efficient ways of representing this huge data set as a whole, thus preventing a further understanding of the diversity of the influenza A virus genome.</p> <p>Results</p> <p>Here we present a new algorithm, "PhyloMap", which combines ordination, vector quantization, and phylogenetic tree construction to give an elegant representation of a large sequence data set. The use of PhyloMap on influenza A virus genome sequences reveals the phylogenetic relationships of the internal genes that cannot be seen when only a subset of sequences are analyzed.</p> <p>Conclusions</p> <p>The application of PhyloMap to influenza A virus genome data shows that it is a robust algorithm for analyzing large sequence data sets. It utilizes the entire data set, minimizes bias, and provides intuitive visualization. PhyloMap is implemented in JAVA, and the source code is freely available at <url>http://www.biochem.uni-luebeck.de/public/software/phylomap.html</url></p

    Panorama phylogenetic diversity and distribution of type A influenza viruses based on their six internal gene sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Type A influenza viruses are important pathogens of humans, birds, pigs, horses and some marine mammals. The viruses have evolved into multiple complicated subtypes, lineages and sublineages. Recently, the phylogenetic diversity of type A influenza viruses from a whole view has been described based on the viral external HA and NA gene sequences, but remains unclear in terms of their six internal genes (PB2, PB1, PA, NP, MP and NS).</p> <p>Methods</p> <p>In this report, 2798 representative sequences of the six viral internal genes were selected from GenBank using the web servers in NCBI Influenza Virus Resource. Then, the phylogenetic relationships among the representative sequences were calculated using the software tools MEGA 4.1 and RAxML 7.0.4. Lineages and sublineages were classified mainly according to topology of the phylogenetic trees and distribution of the viruses in hosts, regions and time.</p> <p>Results</p> <p>The panorama phylogenetic trees of the six internal genes of type A influenza viruses were constructed. Lineages and sublineages within the type based on the six internal genes were classified and designated by a tentative universal numerical nomenclature system. The diversity of influenza viruses circulating in different regions, periods, and hosts based on the panorama trees was analyzed.</p> <p>Conclusion</p> <p>This study presents the first whole views to the phylogenetic diversity and distribution of type A influenza viruses based on their six internal genes. It also proposes a tentative universal nomenclature system for the viral lineages and sublineages. These can be a candidate framework to generalize the history and explore the future of the viruses, and will facilitate future scientific communications on the phylogenetic diversity and evolution of the viruses. In addition, it provides a novel phylogenetic view (i.e. the whole view) to recognize the viruses including the origin of the pandemic A(H1N1) influenza viruses.</p