5 research outputs found

    Curation of viral genomes: challenges, applications and the way forward

    Get PDF
    BACKGROUND: Whole genome sequence data is a step towards generating the 'parts list' of life to understand the underlying principles of Biocomplexity. Genome sequencing initiatives of human and model organisms are targeted efforts towards understanding principles of evolution with an application envisaged to improve human health. These efforts culminated in the development of dedicated resources. Whereas a large number of viral genomes have been sequenced by groups or individuals with an interest to study antigenic variation amongst strains and species. These independent efforts enabled viruses to attain the status of 'best-represented taxa' with the highest number of genomes. However, due to lack of concerted efforts, viral genomic sequences merely remained as entries in the public repositories until recently. RESULTS: VirGen is a curated resource of viral genomes and their analyses. Since its first release, it has grown both in terms of coverage of viral families and development of new modules for annotation and analysis. The current release (2.0) includes data for twenty-five families with broad host range as against eight in the first release. The taxonomic description of viruses in VirGen is in accordance with the ICTV nomenclature. A well-characterised strain is identified as a 'representative entry' for every viral species. This non-redundant dataset is used for subsequent annotation and analyses using sequenced-based Bioinformatics approaches. VirGen archives precomputed data on genome and proteome comparisons. A new data module that provides structures of viral proteins available in PDB has been incorporated recently. One of the unique features of VirGen is predicted conformational and sequential epitopes of known antigenic proteins using in-house developed algorithms, a step towards reverse vaccinology. CONCLUSION: Structured organization of genomic data facilitates use of data mining tools, which provides opportunities for knowledge discovery. One of the approaches to achieve this goal is to carry out functional annotations using comparative genomics. VirGen, a comprehensive viral genome resource that serves as an annotation and analysis pipeline has been developed for the curation of public domain viral genome data . Various steps in the curation and annotation of the genomic data and applications of the value-added derived data are substantiated with case studies

    Comparative Genomics of Cell Envelope Components in Mycobacteria

    Get PDF
    Mycobacterial cell envelope components have been a major focus of research due to their unique features that confer intrinsic resistance to antibiotics and chemicals apart from serving as a low-permeability barrier. The complex lipids secreted by Mycobacteria are known to evoke/repress host-immune response and thus contribute to its pathogenicity. This study focuses on the comparative genomics of the biosynthetic machinery of cell wall components across 21-mycobacterial genomes available in GenBank release 179.0. An insight into survival in varied environments could be attributed to its variation in the biosynthetic machinery. Gene-specific motifs like ‘DLLAQPTPAW’ of ufaA1 gene, novel functional linkages such as involvement of Rv0227c in mycolate biosynthesis; Rv2613c in LAM biosynthesis and Rv1209 in arabinogalactan peptidoglycan biosynthesis were detected in this study. These predictions correlate well with the available mutant and coexpression data from TBDB. It also helped to arrive at a minimal functional gene set for these biosynthetic pathways that complements findings using TraSH

    VirGen: a comprehensive viral genome resource

    No full text
    VirGen is a comprehensive viral genome resource that organizes the ‘sequence space’ of viral genomes in a structured fashion. It has been developed with the objective of serving as an annotated and curated database comprising complete genome sequences of viruses, value-added derived data and data mining tools. The current release (v1.1) contains 559 complete genomes in addition to 287 putative genomes of viruses belonging to eight viral families for which the host range includes animals and plants. Viral genomes in VirGen are annotated using sequence-based Bioinformatics approaches. The genomic data is also curated to identify ‘alternate names’ of viral proteins, where available. VirGen archives the results of comparisons of genomes, proteomes and individual proteins within and between viral species. It is the first resource to provide phylogenetic trees of viral species computed using whole-genome sequence data. The module of predicted B-cell antigenic determinants in VirGen is an attempt to link the genome to its vaccinome. Comparative genome analysis data facilitate the study of genome organization and evolution of viruses, which would have implications in applied research to identify candidates for the design of vaccines and antiviral drugs. VirGen is a relational database and is available at http://bioinfo.ernet.in/virgen/virgen.html

    A large-scale evaluation of computational protein function prediction.

    Get PDF
    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools
    corecore