58 research outputs found

    The SUPERFAMILY database in 2007: families and functions

    Get PDF
    The SUPERFAMILY database provides protein domain assignments, at the SCOP ‘superfamily’ level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from . The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment

    The protein structure initiative structural genomics knowledgebase

    Get PDF
    The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities

    DBD––taxonomically broad transcription factor predictions: new content and functionality

    Get PDF
    DNA-binding domain (DBD) is a database of predicted sequence-specific DNA-binding transcription factors (TFs) for all publicly available proteomes. The proteomes have increased from 150 in the initial version of DBD to over 700 in the current version. All predicted TFs must contain a significant match to a hidden Markov model representing a sequence-specific DNA-binding domain family. Access to TF predictions is provided through http://transcriptionfactor.org, where new search options are now provided such as searching by gene names in model organisms, searching for all proteins in a particular DBD family and specific organism. We illustrate the application of this type of search facility by contrasting trends of DBD family occurrence throughout the tree of life, highlighting the clear partition between eukaryotic and prokaryotic DBD expansions. The website content has been expanded to include dedicated pages for each TF containing domain assignment details, gene names, links to external databases and links to TFs with similar domain arrangements. We compare the increase in number of predicted TFs with proteome size in eukaryotes and prokaryotes. Eukaryotes follow a slower rate of increase in TFs than prokaryotes, which could be due to the presence of splice variants or an increase in combinatorial control

    MicrobesOnline: an integrated portal for comparative and functional genomics

    Get PDF
    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.United States. Dept. of Energy (Genomics: GTL program (grant DE-AC02-05CH11231)

    CentrosomeDB: a human centrosomal proteins database

    Get PDF
    Active research on the biology of the centrosome during the past decades has allowed the identification and characterization of many centrosomal proteins. Unfortunately, the accumulated data is still dispersed among heterogeneous sources of information. Here we present centrosome:db, which intends to compile and integrate relevant information related to the human centrosome. We have compiled a set of 383 likely human centrosomal genes and recorded the associated supporting evidences. Centrosome:db offers several perspectives to study the human centrosome including evolution, function and structure. The database contains information on the orthology relationships with other species, including fungi, nematodes, arthropods, urochordates and vertebrates. Predictions of the domain organization of centrosome:db proteins are graphically represented at different sections of the database, including sets of alternative protein isoforms, interacting proteins, groups of orthologs and the homologs identified with blast. Centrosome:db also contains information related to function, gene–disease associations, SNPs and the 3D structure of proteins. Apart from important differences in the coverage of the set of centrosomal genes, our database differentiates from other similar initiatives in the way information is treated and analyzed. Centrosome:db is publicly available at http://centrosome.dacya.ucm.es

    The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions

    Get PDF
    The Genomes TO Protein Structures and Functions (GTOP) database (http://spock.genes.nig.ac.jp/~genome/gtop.html) freely provides an extensive collection of information on protein structures and functions obtained by application of various computational tools to the amino acid sequences of entirely sequenced genomes. GTOP contains annotations of 3D structures, protein families, functions, and other useful data of a protein of interest in user-friendly ways to give a deep insight into the protein structure. From the initial 1999 version, GTOP has been continually updated to reap the fruits of genome projects and augmented to supply novel information, in particular intrinsically disordered regions. As intrinsically disordered regions constitute a considerable fraction of proteins and often play crucial roles especially in eukaryotes, their assignments give important additional clues to the functionality of proteins. Additionally, we have incorporated the following features into GTOP: a platform independent structural viewer, results of HMM searches against SCOP and Pfam, secondary structure predictions, color display of exon boundaries in eukaryotic proteins, assignments of gene ontology terms, search tools, and master files

    Universal Features in the Genome-level Evolution of Protein Domains

    Get PDF
    Protein domains are found on genomes with notable statistical distributions, which bear a high degree of similarity. Previous work has shown how these distributions can be accounted for by simple models, where the main ingredients are probabilities of duplication, innovation, and loss of domains. However, no one so far has addressed the issue that these distributions follow definite trends depending on protein-coding genome size only. We present a stochastic duplication/innovation model, falling in the class of so-called Chinese Restaurant Processes, able to explain this feature of the data. Using only two universal parameters, related to a minimal number of domains and to the relative weight of innovation to duplication, the model reproduces two important aspects: (a) the populations of domain classes (the sets, related to homology classes, containing realizations of the same domain in different proteins) follow common power-laws whose cutoff is dictated by genome size, and (b) the number of domain families is universal and markedly sublinear in genome size. An important ingredient of the model is that the innovation probability decreases with genome size. We propose the possibility to interpret this as a global constraint given by the cost of expanding an increasingly complex interactome. Finally, we introduce a variant of the model where the choice of a new domain relates to its occurrence in genomic data, and thus accounts for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates the coexistence of the well-known specificity of proteomes with robust self-organizing phenomena related to the basic evolutionary ``moves'' of duplication and innovation

    Data growth and its impact on the SCOP database: new developments

    Get PDF
    The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop

    SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny

    Get PDF
    SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site
    corecore