Search CORE

58 research outputs found

The SUPERFAMILY database in 2007: families and functions

Author: Chothia Cyrus
Gough Julian
Madera Martin
Vogel Christine
Wilson Derek
Publication venue: Oxford University Press
Publication date: 10/11/2006
Field of study

The SUPERFAMILY database provides protein domain assignments, at the SCOP ‘superfamily’ level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from . The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment

CiteSeerX

Crossref

PubMed Central

Explore Bristol Research

The protein structure initiative structural genomics knowledgebase

Author: A. Kouranov
Ashburner
Benson
Berman
Berman
Berman
Berman
Corpet
F. Kiefer
H. M. Berman
Haft
J. D. Westbrook
J. Kopp
J. L. Baer
K. Arnold
Kopp
Kouranov
L. Bordoli
L. G. Carter
Lo Conte
M. J. Gabanyi
M. Podvinec
Orengo
P. D. Adams
Pieper
R. Nair
R. Shah
Sonnhammer
T. Schwede
W. Minor
W. Tao
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities

DBD––taxonomically broad transcription factor predictions: new content and functionality

Author: Adryan
Amoutzias
Balaji
Baldauf
Barrasa
Bulyk
Chao
Cohn
Coin
Derek Wilson
Drosophila Comparative Genome Sequencing and Analysis Consortium
Finn
Hermoso
Kummerfeld
Messina
Mott
Mulder
Murzin
Ohme-Takagi
Pérez-Rueda
Ranea
Robertson
Sarah A. Teichmann
Sarah K. Kummerfeld
van Nimwegen
Varodom Charoensawan
Wilson
Yang
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

DNA-binding domain (DBD) is a database of predicted sequence-specific DNA-binding transcription factors (TFs) for all publicly available proteomes. The proteomes have increased from 150 in the initial version of DBD to over 700 in the current version. All predicted TFs must contain a significant match to a hidden Markov model representing a sequence-specific DNA-binding domain family. Access to TF predictions is provided through http://transcriptionfactor.org, where new search options are now provided such as searching by gene names in model organisms, searching for all proteins in a particular DBD family and specific organism. We illustrate the application of this type of search facility by contrasting trends of DBD family occurrence throughout the tree of life, highlighting the clear partition between eukaryotic and prokaryotic DBD expansions. The website content has been expanded to include dedicated pages for each TF containing domain assignment details, gene names, links to external databases and links to TFs with similar domain arrangements. We compare the increase in number of predicted TFs with proteome size in eukaryotes and prokaryotes. Eukaryotes follow a slower rate of increase in TFs than prokaryotes, which could be due to the presence of splice variants or an increase in combinatorial control

CiteSeerX

Crossref

PubMed Central

MicrobesOnline: an integrated portal for comparative and functional genomics

Author: A. P. Arkin
Alm
Badger
Berman
Bland
D. Chivian
E. J. Alm
G. D. Friedland
I. L. Dubchak
J. K. Baumohl
J. T. Bates
K. H. Huang
K. Keller
Kanehisa
Lowe
M. N. Price
M. P. Joachimiak
Mi
Nikolskaya
P. S. Dehal
P. S. Novichkov
Price
Tatusov
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/09/2009
Field of study

Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.United States. Dept. of Energy (Genomics: GTL program (grant DE-AC02-05CH11231)

CentrosomeDB: a human centrosomal proteins database

Author: A. Pascual-Montano
Alexeyenko
Altschul
Attwood
Doxsey
Durinck
F. Abascal
J. Diez-Perez
J. M. Carazo
Lupas
McKusick
Mishra
R. Nogales-Cadenas
Rieder
Tagari
Publication venue: Oxford University Press
Publication date: 29/10/2008
Field of study

Active research on the biology of the centrosome during the past decades has allowed the identification and characterization of many centrosomal proteins. Unfortunately, the accumulated data is still dispersed among heterogeneous sources of information. Here we present centrosome:db, which intends to compile and integrate relevant information related to the human centrosome. We have compiled a set of 383 likely human centrosomal genes and recorded the associated supporting evidences. Centrosome:db offers several perspectives to study the human centrosome including evolution, function and structure. The database contains information on the orthology relationships with other species, including fungi, nematodes, arthropods, urochordates and vertebrates. Predictions of the domain organization of centrosome:db proteins are graphically represented at different sections of the database, including sets of alternative protein isoforms, interacting proteins, groups of orthologs and the homologs identified with blast. Centrosome:db also contains information related to function, gene–disease associations, SNPs and the 3D structure of proteins. Apart from important differences in the coverage of the set of centrosomal genes, our database differentiates from other similar initiatives in the way information is treated and analyzed. Centrosome:db is publicly available at http://centrosome.dacya.ucm.es

Crossref

PubMed Central

Digital.CSIC

The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions

Author: Altschul
Dunker
H. Sugawara
Homma
K. Homma
K. Nishikawa
Kawabata
Marchler-Bauer
Minezaki
Minezaki
Pieper
S. Fukuchi
S. Sakamoto
Sayle
T. Gojobori
Tompa
Ward
Wright
Y. Tateno
Publication venue: Oxford University Press
Publication date
Field of study

The Genomes TO Protein Structures and Functions (GTOP) database (http://spock.genes.nig.ac.jp/~genome/gtop.html) freely provides an extensive collection of information on protein structures and functions obtained by application of various computational tools to the amino acid sequences of entirely sequenced genomes. GTOP contains annotations of 3D structures, protein families, functions, and other useful data of a protein of interest in user-friendly ways to give a deep insight into the protein structure. From the initial 1999 version, GTOP has been continually updated to reap the fruits of genome projects and augmented to supply novel information, in particular intrinsically disordered regions. As intrinsically disordered regions constitute a considerable fraction of proteins and often play crucial roles especially in eukaryotes, their assignments give important additional clues to the functionality of proteins. Additionally, we have incorporated the following features into GTOP: a platform independent structural viewer, results of HMM searches against SCOP and Pfam, secondary structure predictions, color display of exon boundaries in eukaryotic proteins, assignments of gene ontology terms, search tools, and master files

Crossref

PubMed Central

Universal Features in the Genome-level Evolution of Protein Domains

Author: Alessandro L. Sellerio
Bruno Bassetti
Marco Cosentino Lagomarsino
Philip D. Heijning
Publication venue
Publication date: 11/07/2008
Field of study

Protein domains are found on genomes with notable statistical distributions, which bear a high degree of similarity. Previous work has shown how these distributions can be accounted for by simple models, where the main ingredients are probabilities of duplication, innovation, and loss of domains. However, no one so far has addressed the issue that these distributions follow definite trends depending on protein-coding genome size only. We present a stochastic duplication/innovation model, falling in the class of so-called Chinese Restaurant Processes, able to explain this feature of the data. Using only two universal parameters, related to a minimal number of domains and to the relative weight of innovation to duplication, the model reproduces two important aspects: (a) the populations of domain classes (the sets, related to homology classes, containing realizations of the same domain in different proteins) follow common power-laws whose cutoff is dictated by genome size, and (b) the number of domain families is universal and markedly sublinear in genome size. An important ingredient of the model is that the innovation probability decreases with genome size. We propose the possibility to interpret this as a global constraint given by the cost of expanding an increasingly complex interactome. Finally, we introduce a variant of the model where the choice of a new domain relates to its occurrence in genomic data, and thus accounts for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates the coexistence of the well-known specificity of proteomes with robust self-organizing phenomena related to the basic evolutionary ``moves'' of duplication and innovation

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Springer - Publisher Connector

PubMed Central

Nature Precedings

Data growth and its impact on the SCOP database: new developments

Author: A. Andreeva
A. G. Murzin
Altschul
Andreeva
Andreeva
Berman
C. Chothia
Chandonia
Chandonia
D. Howorth
Finn
J.-M. Chandonia
Lo Conte
Moroz
Murzin
S. E. Brenner
T. J. P. Hubbard
Wheeler
Yooseph
Publication venue: Oxford University Press
Publication date: 13/11/2007
Field of study

The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop

King's Research Portal

SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny

Author: Altschul
Altschul
Andreeva
Ashburner
Attwood
Benson
Berman
Brinkrolf
Bru
Chandonia
Charles Talbot
Chothia
Christine Vogel
Cyrus Chothia
Derek Wilson
Dowell
Eddy
Eichinger
Finn
Haft
Hubbard
Hulo
Julian Gough
Karplus
Letunic
Loewenstein
Madera
Martin Madera
Mi
Mulder
Pereira-Leal
Ralph Pethica
Ranea
Rasteiro
Rost
Stein
Swarbreck
Virel
Vogel
Vogel
Vogel
Wang
Wilson
Wilson
Wu
Yang
Yeats
Yiduo Zhou
Publication venue: Oxford University Press
Publication date: 01/11/2008
Field of study

SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site

CiteSeerX

Crossref

PubMed Central

Explore Bristol Research