22 research outputs found
GenomeBlast: a web tool for small genome comparison
BACKGROUND: Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inference of phylogenetic relationships in a given set of genomes. RESULTS: GenomeBlast is a Web tool developed for comparative analysis of multiple small genomes. A new parameter called "coverage" was introduced and used along with sequence identity to evaluate global similarity between genes. With GenomeBlast, the following results can be obtained: (1) unique genes in each genome; (2) homologous gene candidates among compared genomes; (3) 2D plots of homologous gene candidates along the all pairwise genome comparisons; and (4) a table of gene presence/absence information and a genome phylogeny. We demonstrated the functions in GenomeBlast with an example of multiple herpesviral genome analysis and illustrated how GenomeBlast is useful for small genome comparison. CONCLUSION: We developed a Web tool for comparative analysis of small genomes, which allows the user not only to identify unique genes and homologous gene candidates among multiple genomes, but also to view their graphical distributions on genomes, and to reconstruct genome phylogeny. GenomeBlast runs on a Linux server with 4 CPUs and 4 GB memory. The online version of GenomeBlast is available to public by using a Web browser with the URL
Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set
There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands
Ancient, independent evolution and distinct molecular features of the novel human T-lymphotropic virus type 4
<p>Abstract</p> <p>Background</p> <p>Human T-lymphotropic virus type 4 (HTLV-4) is a new deltaretrovirus recently identified in a primate hunter in Cameroon. Limited sequence analysis previously showed that HTLV-4 may be distinct from HTLV-1, HTLV-2, and HTLV-3, and their simian counterparts, STLV-1, STLV-2, and STLV-3, respectively. Analysis of full-length genomes can provide basic information on the evolutionary history and replication and pathogenic potential of new viruses.</p> <p>Results</p> <p>We report here the first complete HTLV-4 sequence obtained by PCR-based genome walking using uncultured peripheral blood lymphocyte DNA from an HTLV-4-infected person. The HTLV-4(1863LE) genome is 8791-bp long and is equidistant from HTLV-1, HTLV-2, and HTLV-3 sharing only 62–71% nucleotide identity. HTLV-4 has a prototypic genomic structure with all enzymatic, regulatory, and structural proteins preserved. Like STLV-2, STLV-3, and HTLV-3, HTLV-4 is missing a third 21-bp transcription element found in the long terminal repeats of HTLV-1 and HTLV-2 but instead contains unique c-Myb and pre B-cell leukemic transcription factor binding sites. Like HTLV-2, the PDZ motif important for cellular signal transduction and transformation in HTLV-1 and HTLV-3 is missing in the C-terminus of the HTLV-4 Tax protein. A basic leucine zipper (b-ZIP) region located in the antisense strand of HTLV-1 and believed to play a role in viral replication and oncogenesis, was also found in the complementary strand of HTLV-4. Detailed phylogenetic analysis shows that HTLV-4 is clearly a monophyletic viral group. Dating using a relaxed molecular clock inferred that the most recent common ancestor of HTLV-4 and HTLV-2/STLV-2 occurred 49,800 to 378,000 years ago making this the oldest known PTLV lineage. Interestingly, this period coincides with the emergence of <it>Homo sapiens sapiens </it>during the Middle Pleistocene suggesting that early humans may have been susceptible hosts for the ancestral HTLV-4.</p> <p>Conclusion</p> <p>The inferred ancient origin of HTLV-4 coinciding with the appearance of <it>Homo sapiens</it>, the propensity of STLVs to cross-species into humans, the fact that HTLV-1 and -2 spread globally following migrations of ancient populations, all suggest that HTLV-4 may be prevalent. Expanded surveillance and clinical studies are needed to better define the epidemiology and public health importance of HTLV-4 infection.</p
Curation of viral genomes: challenges, applications and the way forward
BACKGROUND: Whole genome sequence data is a step towards generating the 'parts list' of life to understand the underlying principles of Biocomplexity. Genome sequencing initiatives of human and model organisms are targeted efforts towards understanding principles of evolution with an application envisaged to improve human health. These efforts culminated in the development of dedicated resources. Whereas a large number of viral genomes have been sequenced by groups or individuals with an interest to study antigenic variation amongst strains and species. These independent efforts enabled viruses to attain the status of 'best-represented taxa' with the highest number of genomes. However, due to lack of concerted efforts, viral genomic sequences merely remained as entries in the public repositories until recently. RESULTS: VirGen is a curated resource of viral genomes and their analyses. Since its first release, it has grown both in terms of coverage of viral families and development of new modules for annotation and analysis. The current release (2.0) includes data for twenty-five families with broad host range as against eight in the first release. The taxonomic description of viruses in VirGen is in accordance with the ICTV nomenclature. A well-characterised strain is identified as a 'representative entry' for every viral species. This non-redundant dataset is used for subsequent annotation and analyses using sequenced-based Bioinformatics approaches. VirGen archives precomputed data on genome and proteome comparisons. A new data module that provides structures of viral proteins available in PDB has been incorporated recently. One of the unique features of VirGen is predicted conformational and sequential epitopes of known antigenic proteins using in-house developed algorithms, a step towards reverse vaccinology. CONCLUSION: Structured organization of genomic data facilitates use of data mining tools, which provides opportunities for knowledge discovery. One of the approaches to achieve this goal is to carry out functional annotations using comparative genomics. VirGen, a comprehensive viral genome resource that serves as an annotation and analysis pipeline has been developed for the curation of public domain viral genome data . Various steps in the curation and annotation of the genomic data and applications of the value-added derived data are substantiated with case studies
Gene and cell survival: lessons from prokaryotic plasmid R1
Plasmids are units of extrachromosomal genetic inheritance found in all kingdoms of life. They replicate autonomously and undergo stable propagation in their hosts. Despite their small size, plasmid replication and gene expression constitute a metabolic burden that compromises their stable maintenance in host cells. This pressure has driven the evolution of strategies to increase plasmid stability—a process accelerated by the ability of plasmids to transfer horizontally between cells and to exchange genetic material with their host and other resident episomal DNAs. These abilities drive the adaptability and diversity of plasmids and their host cells. Indeed, survival functions found in plasmids have chromosomal homologues that have an essential role in cellular responses to stress. An analysis of these functions in the prokaryotic plasmid R1, and of their intricate interrelationships, reveals remarkable overall similarities with other gene- and cell-survival strategies found within and beyond the prokaryotic world