1,197 research outputs found
Diversity in protein domain superfamilies
Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a 'domain grammar of function' whereby a protein's biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function
Characterising functional diversity in protein domain superfamilies and metagenomes
The majority of CATH domain structure superfamilies have small populations and are conserved in sequence and function. However, previous studies have shown that 4% are highly populated and functionally diverse. Previous analyses of some of these showed that relatives with different functions tend to exploit different functional sites to perform their function. In this work, functional site diversity was explored with a much larger dataset of superfamilies, by examining residues involved in protein interfaces and catalytic sites. This was done using a novel protocol to map sites across each superfamily. Functional site locations were shown to be least diverse for catalytic sites and most diverse for protein- protein binding sites. However, although protein interaction sites can vary considerably, in 79% of superfamilies analysed there is a common protein interface site, used by at least 80% of the functionally diverse relatives.
By contrast with protein interactions, enzyme superfamilies tend to use the same active site in functionally diverse relatives. However, sometimes the nature and location of catalytic residues vary. We examined changes in catalytic machinery over one hundred enzyme superfamilies by considering physicochemical properties and sequence/structure positions. Reaction mechanisms were also compared to explore how enzyme chemistry has evolved between functionally diverse relatives and how changes in chemistry relate to changes in catalytic residues. A complex relationship was found and several examples are discussed to illustrate the different trends identified.
In the final chapter, we assigned metagenome sequences to functional families in CATH and used KEGG pathway annotations to identify differences in the functional abilities of two metagenome environments, the human tongue and gut. Bacteroidetes, Firmicutes, and Proteobacteria phyla dominate both microbiomes. Enriched functional terms in the tongue and gut environments suggested an enrichment of bacterial cell wall building proteins in the mouth and an enrichment of denitrifying enzymes in the gut
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterised. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional subclassification of CATH superfamilies. The superfamilies are subclassified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer
CATH FunFHMMer web server: protein functional annotations using functional family assignments
The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence-structure-function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer
Variants within TSC2 exons 25 and 31 are very unlikely to cause clinically diagnosable tuberous sclerosis
Inactivating mutations in TSC1 and TSC2 cause tuberous sclerosis complex (TSC). The 2012 international consensus meeting on TSC diagnosis and management agreed that the identification of a pathogenic TSC1 or TSC2 variant establishes a diagnosis of TSC, even in the absence of clinical signs. However, exons 25 and 31 of TSC2 are subject to alternative splicing. No variants causing clinically diagnosed TSC have been reported in these exons raising the possibility that such variants would not cause TSC. We present truncating and in-frame variants in exons 25 and 31 in three individuals unlikely to fulfil TSC diagnostic criteria and examine the importance of these exons in TSC using different approaches. Amino acid conservation analysis suggests significantly less conservation in these exons compared to the majority of TSC2 exons, and TSC2 expression data demonstrates that the majority of TSC2 transcripts lack exons 25 and/or 31 in many human adult tissues. In vitro assay of both exons shows that neither exon is essential for TSC complex function. Our evidence suggests that variants in TSC2 exons 25 or 31 are very unlikely to cause classical TSC, although a role for these exons in tissue/stage specific development cannot be excluded
CATH: an expanded resource to predict protein function through structure and sequence
The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site
Functional innovation from changes in protein domains and their combinations
Domains are the functional building blocks of proteins. In this work we discuss how domains can contribute to the evolution of new functions. Domains themselves can evolve through various mechanisms, altering their intrinsic function. Domains can also facilitate functional innovations by combining with other domains to make novel proteins. We discuss the mechanisms by which domain and domain combinations support functional innovations. We highlight interesting examples where changes in domain combination promote changes at the domain level
Gene3D: expanding the utility of domain assignments
Gene3D http://gene3d.biochem.ucl.ac.uk is a database of domain annotations of Ensembl and UniProtKB protein sequences. Domains are predicted using a library of profile HMMs representing 2737 CATH superfamilies. Gene3D has previously featured in the Database issue of NAR and here we report updates to the website and database. The current Gene3D (v14) release has expanded its domain assignments to ∼20 000 cellular genomes and over 43 million unique protein sequences, more than doubling the number of protein sequences since our last publication. Amongst other updates, we have improved our Functional Family annotation method. We have also improved the quality and coverage of our 3D homology modelling pipeline of predicted CATH domains. Additionally, the structural models have been expanded to include an extra model organism (Drosophila melanogaster). We also document a number of additional visualization tools in the Gene3D website
Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year
SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals
SARS-CoV-2 has a zoonotic origin and was transmitted to humans via an undetermined intermediate host, leading to infections in humans and other mammals. To enter host cells, the viral spike protein (S-protein) binds to its receptor, ACE2, and is then processed by TMPRSS2. Whilst receptor binding contributes to the viral host range, S-protein:ACE2 complexes from other animals have not been investigated widely. To predict infection risks, we modelled S-protein:ACE2 complexes from 215 vertebrate species, calculated changes in the energy of the complex caused by mutations in each species, relative to human ACE2, and correlated these changes with COVID-19 infection data. We also analysed structural interactions to better understand the key residues contributing to affinity. We predict that mutations are more detrimental in ACE2 than TMPRSS2. Finally, we demonstrate phylogenetically that human SARS-CoV-2 strains have been isolated in animals. Our results suggest that SARS-CoV-2 can infect a broad range of mammals, but few fish, birds or reptiles. Susceptible animals could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance
- …