73 research outputs found

    Protein Family Expansions and Biological Complexity

    Get PDF
    During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms

    Network rewiring is an important mechanism of gene essentiality change.

    Get PDF
    Gene essentiality changes are crucial for organismal evolution. However, it is unclear how essentiality of orthologs varies across species. We investigated the underlying mechanism of gene essentiality changes between yeast and mouse based on the framework of network evolution and comparative genomic analysis. We found that yeast nonessential genes become essential in mouse when their network connections rapidly increase through engagement in protein complexes. The increased interactions allowed the previously nonessential genes to become members of vital pathways. By accounting for changes in gene essentiality, we firmly reestablished the centrality-lethality rule, which proposed the relationship of essential genes and network hubs. Furthermore, we discovered that the number of connections associated with essential and non-essential genes depends on whether they were essential in ancestral species. Our study describes for the first time how network evolution occurs to change gene essentiality

    Family-specific scaling laws in bacterial genomes

    Get PDF
    Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes.Comment: 41 pages, 16 figure

    The SUPERFAMILY database in 2007: families and functions

    Get PDF
    The SUPERFAMILY database provides protein domain assignments, at the SCOP ‘superfamily’ level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from . The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment

    SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny

    Get PDF
    SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site

    Inferring PDZ Domain Multi-Mutant Binding Preferences from Single-Mutant Data

    Get PDF
    Many important cellular protein interactions are mediated by peptide recognition domains. The ability to predict a domain's binding specificity directly from its primary sequence is essential to understanding the complexity of protein-protein interaction networks. One such recognition domain is the PDZ domain, functioning in scaffold proteins that facilitate formation of signaling networks. Predicting the PDZ domain's binding specificity was a part of the DREAM4 Peptide Recognition Domain challenge, the goal of which was to describe, as position weight matrices, the specificity profiles of five multi-mutant ERBB2IP-1 domains. We developed a method that derives multi-mutant binding preferences by generalizing the effects of single point mutations on the wild type domain's binding specificities. Our approach, trained on publicly available ERBB2IP-1 single-mutant phage display data, combined linear regression-based prediction for ligand positions whose specificity is determined by few PDZ positions, and single-mutant position weight matrix averaging for all other ligand columns. The success of our method as the winning entry of the DREAM4 competition, as well as its superior performance over a general PDZ-ligand binding model, demonstrates the advantages of training a model on a well-selected domain-specific data set

    Genetic identity and differential gene expression between Trichomonas vaginalis and Trichomonas tenax

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Trichomonas vaginalis </it>is a human urogenital pathogen responsible for trichomonosis, the number-one, non-viral sexually transmitted disease (STD) worldwide, while <it>T. tenax </it>is a commensal of the human oral cavity, found particularly in patients with poor oral hygiene and advanced periodontal disease. The extent of genetic identity between <it>T. vaginalis </it>and its oral commensal counterpart is unknown.</p> <p>Results</p> <p>Genes that were differentially expressed in <it>T. vaginalis </it>were identified by screening three independent subtraction cDNA libraries enriched for <it>T. vaginalis </it>genes. The same thirty randomly selected cDNA clones encoding for proteins with specific functions associated with colonization were identified from each of the subtraction cDNA libraries. In addition, a <it>T. vaginalis </it>cDNA expression library was screened with patient sera that was first pre-adsorbed with an extract of <it>T. tenax </it>antigens, and seven specific cDNA clones were identified from this cDNA library. Interestingly, some of the clones identified by the subtraction cDNA screening were also obtained from the cDNA expression library with the pre-adsorbed sera. Moreover and noteworthy, clones identified by both the procedures were found to be up-regulated in expression in <it>T. vaginalis </it>upon contact with vaginal epithelial cells, suggesting a role for these gene products in host colonization. Semi-quantitative RT-PCR analysis of select clones showed that the genes were not unique to <it>T. vaginalis </it>and that these genes were also present in <it>T. tenax</it>, albeit at very low levels of expression.</p> <p>Conclusion</p> <p>These results suggest that <it>T. vaginalis </it>and <it>T. tenax </it>have remarkable genetic identity and that <it>T. vaginalis </it>has higher levels of gene expression when compared to that of <it>T. tenax</it>. The data may suggest that <it>T. tenax </it>could be a variant of <it>T. vaginalis</it>.</p

    Neural immunoglobulin superfamily interaction networks

    Get PDF
    The immunoglobulin superfamily (IgSF) encompasses hundreds of cell surface proteins containing multiple immunoglobulin-like (Ig) domains. Among these are neural IgCAMs, which are cell adhesion molecules that mediate interactions between cells in the nervous system. IgCAMs in some vertebrate IgSF subfamilies bind to each other homophilically and heterophilically, forming small interaction networks. In Drosophila, a global ‘interactome’ screen identified two larger networks in which proteins in one IgSF subfamily selectively interact with proteins in a different subfamily. One of these networks, the ‘Dpr-ome’, includes 30 IgSF proteins, each of which is expressed in a unique subset of neurons. Recent evidence shows that one interacting protein pair within the Dpr-ome network is required for development of the brain and neuromuscular system

    Structural Disorder in Eukaryotes

    Get PDF
    Based on early bioinformatic studies on a handful of species, the frequency of structural disorder of proteins is generally thought to be much higher in eukaryotes than in prokaryotes. To refine this view, we present here a comparative prediction study and analysis of 194 fully described eukaryotic proteomes and 87 reference prokaryotes for structural disorder. We found that structural disorder does distinguish eukaryotes from prokaryotes, but its frequency spans a very wide range in the two superkingdoms that largely overlap. The number of disordered binding regions and different Pfam domain types also contribute to distinguish eukaryotes from prokaryotes. Unexpectedly, the highest levels – and highest variability – of predicted disorder is found in protists, i.e. single-celled eukaryotes, often surpassing more complex eukaryote organisms, plants and animals. This trend contrasts with that of the number of domain types, which increases rather monotonously toward more complex organisms. The level of structural disorder appears to be strongly correlated with lifestyle, because some obligate intracellular parasites and endosymbionts have the lowest levels, whereas host-changing parasites have the highest level of predicted disorder. We conclude that protists have been the evolutionary hot-bed of experimentation with structural disorder, in a period when structural disorder was actively invented and the major functional classes of disordered proteins established
    corecore