Search CORE

72 research outputs found

Protein Family Expansions and Biological Complexity

Author: Christine Vogel
Cyrus Chothia
Philip Bourne
Publication venue: Public Library of Science
Publication date: 01/05/2006
Field of study

During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Texas ScholarWorks

The SUPERFAMILY database in 2007: families and functions

Author: Chothia Cyrus
Gough Julian
Madera Martin
Vogel Christine
Wilson Derek
Publication venue: Oxford University Press
Publication date: 10/11/2006
Field of study

The SUPERFAMILY database provides protein domain assignments, at the SCOP ‘superfamily’ level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from . The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment

CiteSeerX

Crossref

PubMed Central

Explore Bristol Research

3D complex: a structural classification of protein complexes.

Author: Chothia Cyrus
Levy Emmanuel D
Pereira-Leal Jose B
Teichmann Sarah A
Publication venue: PLoS Comput Biol
Publication date: 01/01/2006
Field of study

Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes

CiteSeerX

Public Library of Science (PLOS)

Access to Research and Communications Annals

Directory of Open Access Journals

PubMed Central

Apollo (Cambridge)

SCOP, Structural Classification of Proteins Database: Applications to Evaluation of the Effectiveness of Sequence Alignment Methods and Statistics of Protein Structural Data

Author: Alexey G. Murzin
Bart Ailey
Cyrus Chothia
Steven E. Brenner
Tim J. P. Hubbard
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date
Field of study

Crossref

SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny

Author: Altschul
Altschul
Andreeva
Ashburner
Attwood
Benson
Berman
Brinkrolf
Bru
Chandonia
Charles Talbot
Chothia
Christine Vogel
Cyrus Chothia
Derek Wilson
Dowell
Eddy
Eichinger
Finn
Haft
Hubbard
Hulo
Julian Gough
Karplus
Letunic
Loewenstein
Madera
Martin Madera
Mi
Mulder
Pereira-Leal
Ralph Pethica
Ranea
Rasteiro
Rost
Stein
Swarbreck
Virel
Vogel
Vogel
Vogel
Wang
Wilson
Wilson
Wu
Yang
Yeats
Yiduo Zhou
Publication venue: Oxford University Press
Publication date: 01/11/2008
Field of study

SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site

CiteSeerX

Crossref

PubMed Central

Explore Bristol Research

Genome3D: exploiting structure to help users understand their sequences.

Author: Andreeva Antonina
Blundell Tom L
Buchan Daniel WA
Chothia Cyrus
Cozzetto Domenico
Dana José M
Filippis Ioannis
Gough Julian
Jones David T
Kelley Lawrence A
Kleywegt Gerard J
Lewis Tony E
Minneci Federico
Mistry Jaina
Murzin Alexey G
Oates Matt E
Ochoa-Montaño Bernardo
Orengo Christine
Punta Marco
Rackham Owen JL
Sillitoe Ian
Stahlhacke Jonathan
Sternberg Michael JE
Velankar Sameer
Publication venue: Nucleic Acids Res
Publication date: 27/10/2014
Field of study

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models

Goldsmiths Research Online

Southampton (e-Prints Soton)

Crossref

PubMed Central

UCL Discovery

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

Exploration of Uncharted Regions of the Protein Universe

Determination of first protein structures, from hundreds of families of unknown function, have shown that divergence, rather than novelty, is the dominant force that shapes the evolution of the protein universe

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The Evolution of Protein Repertoires

Author: Cyrus Chothia
Publication venue: 'Portland Press Ltd.'
Publication date
Field of study

Crossref