Search CORE

492 research outputs found

VIDA: a virus database system for the organization of animal virus genome open reading frames

Author: Alba MM
Kellam P
Lee D
Martin N
Orengo CA
Pearl FMG
Shepherd AJ
Publication venue: OXFORD UNIV PRESS
Publication date: 01/01/2001
Field of study

VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships, Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus-database/ VIDA.html

UCL Discovery

PubMed Central

The history of the CATH structural classification of protein domains

Author: Dawson N
Orengo C
Sillitoe I
Thornton J
Publication venue
Publication date: 01/12/2015
Field of study

This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families

Elsevier - Publisher Connector

UCL Discovery

PubMed Central

CATH functional families predict functional sites in proteins

Author: Das S
Orengo C
Scholes HM
Sen N
Publication venue
Publication date: 02/11/2020
Field of study

MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites. AVAILABILITY: https://github.com/UCL/cath-funsite-predictor. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

UCL Discovery

Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds

Author: Bordin N
Lees JG
Orengo C
Sillitoe I
Publication venue: Frontiers Media SA
Publication date: 01/05/2021
Field of study

This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators

UCL Discovery

The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors

Author: Bordin N
Orengo C
Varadi M
Velankar S
Publication venue: 'Elsevier BV'
Publication date: 01/04/2023
Field of study

The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction

UCL Discovery

Computational approaches to predict protein functional families and functional sites.

Author: Abbasian M
Orengo CA
Rauer C
Sen N
Waman VP
Publication venue
Publication date: 01/10/2021
Field of study

Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features

UCL Discovery

MACiE: a database of enzyme reaction mechanisms.

Author: Bartlett
Berman
D. E. Almonacid
G. J. Bartlett
G. L. Holliday
J. B. O. Mitchell
J. M. Thornton
Kanehisa
N. M. O'Boyle
Nagano
Orengo
P. Murray-Rust
Porter
Schomburg
Publication venue: 'Indiana University Center for Genomics and Bioinformatics (CGB)'
Publication date: 27/09/2005
Field of study

SUMMARY: MACiE (mechanism, annotation and classification in enzymes) is a publicly available web-based database, held in CMLReact (an XML application), that aims to help our understanding of the evolution of enzyme catalytic mechanisms and also to create a classification system which reflects the actual chemical mechanism (catalytic steps) of an enzyme reaction, not only the overall reaction. AVAILABILITY: http://www-mitchell.ch.cam.ac.uk/macie/.EPSRC (G.L.H. and J.B.O.M.), the BBSRC (G.J.B. and J.M.T.—CASE studentship in association with Roche Products Ltd; N.M.O.B. and J.B.O.M.—grant BB/C51320X/1), the Chilean Government’s Ministerio de Planificacio´n y Cooperacio´n and Cambridge Overseas Trust (D.E.A.) for funding and Unilever for supporting the Centre for Molecular Science Informatics.application note restricted to 2 printed pages web site: http://www-mitchell.ch.cam.ac.uk/macie

Crossref

PubMed Central

Apollo (Cambridge)

University of St. Andrews - Pure

An optimized TOPS+ comparison method for enhanced TOPS models

Author: A Brazma
A Harrison
A Harrison
CA Orengo
CA Orengo
CA Orengo
CJ van Rijsbergen
D Gilbert
D Gilbert
D Westhead
David Gilbert
G Valiente
Gabriel Valiente
GJ Barton
GM Torrance
HM Berman
HM Grindley
I Koch
I Michalopoulos
IN Shindyalov
J Handl
J Viksna
K Mizuguchi
L Holm
LP Chew
M Veeramalai
M Veeramalai
M Veeramalai
Mallika Veeramalai
N Krasnogor
RB Russell
S Goldsmith-Fischman
SB Needleman
SS Krishna
T Madej
T Madej
TF Smith
VI Levenshtein
WR Taylor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

This article has been made available through the Brunel Open Access Publishing Fund.Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Brunel University Research Archive

Extending CATH: increasing coverage of the protein structure universe and linking structure with function

Author: A. B. Clegg
A. L. Cuff
Ashburner
Bairoch
Buchan
C. A. Orengo
Chandonia
Cuff
D. Jones
Dessailly
Grabowski
Hendrickson
I. Sillitoe
J. Thornton
Kanehisa
M. Pellegrini-Calace
N. Furnham
Neumann
Orengo
Orengo
R. Rentzsch
Rahman
Redfern
Ruepp
T. Lewis
Taylor
Todd
Publication venue: Oxford University Press
Publication date: 19/11/2010
Field of study

CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework

Crossref

LSHTM Research Online

PubMed Central

Gene3D: comprehensive structural and functional annotation of genomes

Author: A. Reid
Altschul
Ashburner
Berman
C. Orengo
C. Yeats
Eddy
Finn
Gough
Guldener
J. Lees
Kanehisa
Kersey
Lee
Marsden
Mulder
Murzin
N. Martin
P. Kellam
Rattei
Rost
Ruepp
Tatusov
Tian
X. Liu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/10/2007
Field of study

Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein–protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk

Crossref

UCL Discovery

PubMed Central

Birkbeck Institutional Research Online

The Australian National University

Spiral - Imperial College Digital Repository