6 research outputs found

    CATH: increased structural coverage of functional space

    No full text
    CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt

    Rationalization of the p<i>K</i><sub>a</sub> values of alcohols and thiols using atomic charge descriptors and its application to the prediction of amino acid p<i>K</i><sub>a</sub>'s

    No full text
    In a first step toward the development of an efficient and accurate protocol to estimate amino acids' pK(a)'s in proteins, we present in this work how to reproduce the pK(a)'s of alcohol and thiol based residues (namely tyrosine, serine, and cysteine) in aqueous solution from the knowledge of the experimental pK(a)'s of phenols, alcohols, and thiols. Our protocol is based on the linear relationship between computed atomic charges of the anionic form of the molecules (being either phenolates, alkoxides, or thiolates) and their respective experimental pK(a) values. It is tested with different environment approaches (gas phase or continuum solvent-based approaches), with five distinct atomic charge models (Mulliken, Lowdin, NPA, Merz-Kollman, and CHelpG), and with nine different DFT functionals combined with 16 different basis sets. Moreover, the capability of semiempirical methods (AM1, RM1, PM3, and PM6) to also predict pK(a)'s of thiols, phenols, and alcohols is analyzed. From our benchmarks, the best combination to reproduce experimental pK(a)'s is to compute NPA atomic charge using the CPCM model at the B3LYP/3-21G and M062X/6-311G levels for alcohols (R-2 = 0.995) and thiols (R-2 = 0.986), respectively. The applicability of the suggested protocol is tested with tyrosine and cysteine amino acids, and precise pK(a) predictions are obtained. The stability of the amino acid pK(a)'s with respect to geometrical changes is also tested by MM-MD and DFT-MD calculations. Considering its strong accuracy and its high computational efficiency, these pK(a) prediction calculations using atomic charges indicate a promising method for predicting amino acids' pK(a) in a protein environment
    corecore