12 research outputs found

    Three-dimensional Structure Databases of Biological Macromolecules

    Get PDF
    Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a)Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results.(b)Information-retrieval tools to allow searching to identify entries of interest and provide access to them.(c)Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction.(d)Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and “helpdesks,” as well as links to external software. Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains

    Computational approaches to predict protein functional families and functional sites.

    Get PDF
    Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features

    Structural and energetic analyses of SARS-CoV-2 N-terminal domain characterise sugar binding pockets and suggest putative impacts of variants on COVID-19 transmission

    Get PDF
    Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 is an ongoing pandemic that causes significant health/socioeconomic burden. Variants of concern (VOCs) have emerged affecting transmissibility, disease severity and re-infection risk. Studies suggest that the - N-terminal domain (NTD) of the spike protein may have a role in facilitating virus entry via sialic-acid receptor binding. Furthermore, most VOCs include novel NTD variants. Despite global sequence and structure similarity, most sialic-acid binding pockets in NTD vary across coronaviruses. Our work suggests ongoing evolutionary tuning of the sugar-binding pockets and recent analyses have shown that NTD insertions in VOCs tend to lie close to loops. We extended the structural characterisation of these sugar-binding pockets and explored whether variants could enhance sialic acid-binding. We found that recent NTD insertions in VOCs (i.e., Gamma, Delta and Omicron variants) and emerging variants of interest (VOIs) (i.e., Iota, Lambda and Theta variants) frequently lie close to sugar-binding pockets. For some variants, including the recent Omicron VOC, we find increases in predicted sialic acid-binding energy, compared to the original SARS-CoV-2, which may contribute to increased transmission. These binding observations are supported by molecular dynamics simulations (MD). We examined the similarity of NTD across Betacoronaviruses to determine whether the sugar-binding pockets are sufficiently similar to be exploited in drug design. Whilst most pockets are too structurally variable, we detected a previously unknown highly structurally conserved pocket which can be investigated in pursuit of a generic pan-Betacoronavirus drug. Our structure-based analyses help rationalise the effects of VOCs and provide hypotheses for experiments. Our findings suggest a strong need for experimental monitoring of changes in NTD of VOCs

    CATH: increased structural coverage of functional space

    No full text
    CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt