119 research outputs found

    Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes

    Get PDF
    Three integrated genomic context methods were used to annotate uncharacterized proteins in 102 bacterial genomes. Of 7853 orthologous groups with unknown function containing 45,110 proteins, 1738 groups could be linked to functionally associated partners. In many cases, those partners are uncharacterized themselves (hinting at newly identified modules) or have been described in general terms only. However, we were able to assign pathways, cellular processes or physical complexes for 273 groups (encompassing 3624 previously functionally uncharacterized proteins)

    Orthology prediction methods: a quality assessment using curated protein families

    Get PDF
    The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community

    eggNOG: automated construction and annotation of orthologous groups of genes

    Get PDF
    The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database ('evolutionary genealogy of genes: Non-supervised Orthologous Groups'), which contains orthologous groups constructed from Smith-Waterman alignments through identification of reciprocal best matches and triangular linkage clustering. Applying this procedure to 312 bacterial, 26 archaeal and 35 eukaryotic genomes yielded 43 582 course-grained orthologous groups of which 9724 are extended versions of those from the original COG/KOG database. We also constructed more fine-grained groups for selected subsets of organisms, such as the 19 914 mammalian orthologous groups. We automatically annotated our non-supervised orthologous groups with functional descriptions, which were derived by identifying common denominators for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains. The orthologous groups in eggNOG contain 1 241 751 genes and provide at least a broad functional description for 77% of them. Users can query the resource for individual genes via a web interface or download the complete set of orthologous groups at http://eggnog.embl.d

    eggNOG: automated construction and annotation of orthologous groups of genes

    Get PDF
    The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database (‘evolutionary genealogy of genes: Non-supervised Orthologous Groups’), which contains orthologous groups constructed from Smith–Waterman alignments through identification of reciprocal best matches and triangular linkage clustering. Applying this procedure to 312 bacterial, 26 archaeal and 35 eukaryotic genomes yielded 43 582 course-grained orthologous groups of which 9724 are extended versions of those from the original COG/KOG database. We also constructed more fine-grained groups for selected subsets of organisms, such as the 19 914 mammalian orthologous groups. We automatically annotated our non-supervised orthologous groups with functional descriptions, which were derived by identifying common denominators for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains. The orthologous groups in eggNOG contain 1 241 751 genes and provide at least a broad functional description for 77% of them. Users can query the resource for individual genes via a web interface or download the complete set of orthologous groups at http://eggnog.embl.de

    PLoS Comput Biol

    Get PDF
    The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene.Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence.To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived

    DAS Writeback: A Collaborative Annotation System

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations.</p> <p>Results</p> <p>We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task.</p> <p>Conclusions</p> <p>We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.</p

    eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges

    Get PDF
    Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%)

    DAS Writeback: A Collaborative Annotation System

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations.</p> <p>Results</p> <p>We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task.</p> <p>Conclusions</p> <p>We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.</p

    The genomes of two key bumblebee species with primitive eusocial organization

    Get PDF
    Background: The shift from solitary to social behavior is one of the major evolutionary transitions. Primitively eusocial bumblebees are uniquely placed to illuminate the evolution of highly eusocial insect societies. Bumblebees are also invaluable natural and agricultural pollinators, and there is widespread concern over recent population declines in some species. High-quality genomic data will inform key aspects of bumblebee biology, including susceptibility to implicated population viability threats. Results: We report the high quality draft genome sequences of Bombus terrestris and Bombus impatiens, two ecologically dominant bumblebees and widely utilized study species. Comparing these new genomes to those of the highly eusocial honeybee Apis mellifera and other Hymenoptera, we identify deeply conserved similarities, as well as novelties key to the biology of these organisms. Some honeybee genome features thought to underpin advanced eusociality are also present in bumblebees, indicating an earlier evolution in the bee lineage. Xenobiotic detoxification and immune genes are similarly depauperate in bumblebees and honeybees, and multiple categories of genes linked to social organization, including development and behavior, show high conservation. Key differences identified include a bias in bumblebee chemoreception towards gustation from olfaction, and striking differences in microRNAs, potentially responsible for gene regulation underlying social and other traits. Conclusions: These two bumblebee genomes provide a foundation for post-genomic research on these key pollinators and insect societies. Overall, gene repertoires suggest that the route to advanced eusociality in bees was mediated by many small changes in many genes and processes, and not by notable expansion or depauperation
    corecore