322 research outputs found

    Clustering the annotation space of proteins

    Get PDF
    BACKGROUND: Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. RESULTS: Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within minutes. It is also accurate, recovering consistent protein families agreeing on average in more than 97% with sequence-based protein families from Pfam. Discrepancies between sequence- and annotation-based clusters were scrutinized and the reasons reported. We demonstrate examples for each of these cases, and thoroughly discuss an example of a propagated error in SwissProt: a vacuolar ATPase subunit M9.2 erroneously annotated as vacuolar ATP synthase subunit H. CLAN algorithm is available from the authors and the CLAN database is accessible at CONCLUSIONS: CLAN creates refined function-and-sequence specific protein families that can be used for identification and annotation of unknown family members. It also allows easy identification of erroneous annotations by spotting inconsistencies between similarities on annotation and sequence levels

    Probabilistic annotation of protein sequences based on functional classifications

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on the detection of sequence similarity and the premise that functional properties are conserved during evolution. Most automatic approaches developed to date rely on the identification of clusters of homologous proteins and the mapping of new proteins onto these clusters, which are expected to share functional characteristics. Results Here, we inverse the logic of this process, by considering the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering. In this mode, the starting point is a database of labelled proteins according to a functional classification scheme, and the subsequent use of sequence similarity allows defining the membership of new proteins to these functional classes. In this framework, we define the Correspondence Indicators as measures of relationship between sequence and function and further formulate two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a functional class. This approach allows the parametrisation of different sequence search strategies and provides a direct measure of annotation error rates. We validate this approach with a database of enzymes labelled by their corresponding four-digit EC numbers and analyse specific cases. Conclusion The performance of this method is significantly higher than the simple strategy consisting in transferring the annotation from the highest scoring BLAST match and is expected to find applications in automated functional annotation pipelines.Published versio

    Measuring genome conservation across taxa: divided strains and united kingdoms

    Get PDF
    Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservationβ€”a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server: <>

    Disease association and comparative genomics of compositional bias in human proteins [version 2; peer review: 2 approved]

    Get PDF
    Background: The evolutionary rate of disordered protein regions varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of compositional bias, indicative of disorder, across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards biased regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, proteins with compositional bias across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of compositional bias, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations

    Tumorigenic Properties of Iron Regulatory Protein 2 (IRP2) Mediated by Its Specific 73-Amino Acids Insert

    Get PDF
    Iron regulatory proteins, IRP1 and IRP2, bind to mRNAs harboring iron responsive elements and control their expression. IRPs may also perform additional functions. Thus, IRP1 exhibited apparent tumor suppressor properties in a tumor xenograft model. Here we examined the effects of IRP2 in a similar setting. Human H1299 lung cancer cells or clones engineered for tetracycline-inducible expression of wild type IRP2, or the deletion mutant IRP2Ξ”73 (lacking a specific insert of 73 amino acids), were injected subcutaneously into nude mice. The induction of IRP2 profoundly stimulated the growth of tumor xenografts, and this response was blunted by addition of tetracycline in the drinking water of the animals, to turnoff the IRP2 transgene. Interestingly, IRP2Ξ”73 failed to promote tumor growth above control levels. As expected, xenografts expressing the IRP2 transgene exhibited high levels of transferrin receptor 1 (TfR1); however, the expression of other known IRP targets was not affected. Moreover, these xenografts manifested increased c-MYC levels and ERK1/2 phosphorylation. A microarray analysis identified distinct gene expression patterns between control and tumors containing IRP2 or IRP1 transgenes. By contrast, gene expression profiles of control and IRP2Ξ”73-related tumors were more similar, consistently with their growth phenotype. Collectively, these data demonstrate an apparent pro-oncogenic activity of IRP2 that depends on its specific 73 amino acids insert, and provide further evidence for a link between IRPs and cancer biology

    Emergence, development and diversification of the TGF-Ξ² signalling pathway within the animal kingdom

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The question of how genomic processes, such as gene duplication, give rise to co-ordinated organismal properties, such as emergence of new body plans, organs and lifestyles, is of importance in developmental and evolutionary biology. Herein, we focus on the diversification of the transforming growth factor-<it>Ξ² </it>(TGF-<it>Ξ²</it>) pathway – one of the fundamental and versatile metazoan signal transduction engines.</p> <p>Results</p> <p>After an investigation of 33 genomes, we show that the emergence of the TGF-<it>Ξ² </it>pathway coincided with appearance of the first known animal species. The primordial pathway repertoire consisted of four Smads and four receptors, similar to those observed in the extant genome of the early diverging tablet animal (<it>Trichoplax adhaerens</it>). We subsequently retrace duplications in ancestral genomes on the lineage leading to humans, as well as lineage-specific duplications, such as those which gave rise to novel Smads and receptors in teleost fishes. We conclude that the diversification of the TGF-<it>Ξ² </it>pathway can be parsimoniously explained according to the 2R model, with additional rounds of duplications in teleost fishes. Finally, we investigate duplications followed by accelerated evolution which gave rise to an atypical TGF-<it>Ξ² </it>pathway in free-living bacterial feeding nematodes of the genus Rhabditis.</p> <p>Conclusion</p> <p>Our results challenge the view of well-conserved developmental pathways. The TGF-<it>Ξ² </it>signal transduction engine has expanded through gene duplication, continually adopting new functions, as animals grew in anatomical complexity, colonized new environments, and developed an active immune system.</p
    • …
    corecore