10 research outputs found

    A New Definition And Classification Of Antibody Complementarity Determining Regions: Unsupervised Learning Of Protein Backbone Conformations Informs Antibody Structural Bioinformatics And Design

    Get PDF
    One of the main challenges in modern molecular biology is to establish general, robust, and precise descriptions of the relationship between structural features of molecules (DNA, RNA, proteins, and glycans) and the sequence of their constituent chemical building blocks (nucleotides, amino acids, monosachharides). In his 1951 Nobel lecture, Linus Pauling predicted that chemistry of the future would rely upon these descriptions to solve problems in biological medicine relevant to human health. As of July 8, 2021, X-ray crystallography, NMR, and Cryo-EM have solved 179,842 molecular structures, which have been deposited in the Protein Data Bank (PDB) along with their associated sequences. Antibodies are the largest such family of deposited protein structures in the PDB, and their importance to human health and research in molecular biology is widely acknowledged. In this work, I first show the development and validation of unsupervised learning software to cluster protein backbone conformations (clustering of backbones for Ramachandran analysis, or COBRA). I then describe the application of this software to the wealth of antibody data in the PDB to provide a novel, electron density validated classification of the antibody complementarity determining regions (CDRs). I compare this new classification to previous classifications of the CDRs to show the improvement of the association between the sequences and structures of the CDRs, the ability to robustly separate various CDR families, and the ability to assess the confidence in the quality of CDR families using electron density as support. In addition to providing a new classification of the antibody CDRs by clustering their backbone conformations, I provide an expanded definition of the antibody binding region by defining, naming, and classifying an antibody V-region segment named the “DE loop”, which resembles the other six CDRs in sequence and structural variability, ability to bind antigen, and ability to stabilize antibodies, but has no current recognition as a canonical member of the CDRs. Finally, I show examples implementing these analyses in RosettaAntibodyDesign (RAbD) software to design antibodies towards SARS-COV-2 Spike Protein Type 1 (S1) Receptor Binding Domain (RBD), and show the experimental data for the generated antibody designs

    Candidate Variants in DNA Replication and Repair Genes in Early-Onset Renal Cell Carcinoma Patients Referred for Germline Testing

    Get PDF
    Background: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. Methods: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. Results: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. Conclusions: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC

    A New Definition and Classification of Antibody Complementarity Determining Regions: Unsupervised Learning of Protein Backbone Conformations Informs Antibody Structural Bioinformatics and Design

    Get PDF
    One of the main challenges in modern molecular biology is to establish general, robust, and precise descriptions of the relationship between structural features of molecules (DNA, RNA, proteins, and glycans) and the sequence of their constituent chemical building blocks (nucleotides, amino acids, monosachharides). In his 1951 Nobel lecture, Linus Pauling predicted that chemistry of the future would rely upon these descriptions to solve problems in biological medicine relevant to human health. As of July 8, 2021, X-ray crystallography, NMR, and Cryo-EM have solved 179,842 molecular structures, which have been deposited in the Protein Data Bank (PDB) along with their associated sequences. Antibodies are the largest such family of deposited protein structures in the PDB, and their importance to human health and research in molecular biology is widely acknowledged. In this work, I first show the development and validation of unsupervised learning software to cluster protein backbone conformations (clustering of backbones for Ramachandran analysis, or COBRA). I then describe the application of this software to the wealth of antibody data in the PDB to provide a novel, electron density validated classification of the antibody complementarity determining regions (CDRs). I compare this new classification to previous classifications of the CDRs to show the improvement of the association between the sequences and structures of the CDRs, the ability to robustly separate various CDR families, and the ability to assess the confidence in the quality of CDR families using electron density as support. In addition to providing a new classification of the antibody CDRs by clustering their backbone conformations, I provide an expanded definition of the antibody binding region by defining, naming, and classifying an antibody V-region segment named the “DE loop”, which resembles the other six CDRs in sequence and structural variability, ability to bind antigen, and ability to stabilize antibodies, but has no current recognition as a canonical member of the CDRs. Finally, I show examples implementing these analyses in RosettaAntibodyDesign (RAbD) software to design antibodies towards SARS-COV-2 Spike Protein Type 1 (S1) Receptor Binding Domain (RBD), and show the experimental data for the generated antibody designs

    Enhancements to the Rosetta Energy Function Enable Improved Identification of Small Molecules that Inhibit Protein-Protein Interactions.

    Get PDF
    Protein-protein interactions are among today's most exciting and promising targets for therapeutic intervention. To date, identifying small-molecules that selectively disrupt these interactions has proven particularly challenging for virtual screening tools, since these have typically been optimized to perform well on more "traditional" drug discovery targets. Here, we test the performance of the Rosetta energy function for identifying compounds that inhibit protein interactions, when these active compounds have been hidden amongst pools of "decoys." Through this virtual screening benchmark, we gauge the effect of two recent enhancements to the functional form of the Rosetta energy function: the new "Talaris" update and the "pwSHO" solvation model. Finally, we conclude by developing and validating a new weight set that maximizes Rosetta's ability to pick out the active compounds in this test set. Looking collectively over the course of these enhancements, we find a marked improvement in Rosetta's ability to identify small-molecule inhibitors of protein-protein interactions

    Overview of the virtual screening benchmark.

    No full text
    <p>A target protein is provided in complex with a known small-molecule inhibitor acting at this protein interaction site. A collection of 2500 diverse “decoy” ligands have also been docked to this site. The benchmark entails scoring each of the 2501 complexes, and determining the rank of the native ligand relative to the decoy compounds. This experiment carried out for each of 18 non-redundant protein targets (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#pone.0140359.t001" target="_blank">Table 1</a>).</p

    Summary of the comparisons made in this study.

    No full text
    <p>Among the 18 protein targets used as testcases, those for which the rankings by both scoring functions were within 10% of one another were considered to be “ties”. The reported p-values were calculated by applying the Wilcoxon Signed-Rank test to the difference in the log of the rankings, over all testcases. This (non-parametric) statistical test has the advantage that the degree to which a given method “wins” each testcase—and not just the number of “wins”—is taken into account.</p

    Baseline performance of the Rosetta energy function prior to recent enhancements.

    No full text
    <p>Each plot compares the performance of two different scoring functions for identifying the active compounds in our virtual screening benchmark. Each of the 18 protein targets corresponds to a single point (<i>green dots</i>); the rank of the active compound (relative to 2500 diverse “decoy” compounds) by each scoring function is indicated. The orange dotted line indicates a ranking of 25, corresponding to the top 1% of the decoy set. <b>(A)</b> FRED’s Chemgauss4 energy function outperforms Rosetta’s original energy function intended for protein-only systems, score12, but not at a statistically significant threshold (p = 0.085). <b>(B)</b> The variant of the score12 energy function that was developed specifically for modeling protein-ligand interactions, score12_ligand, offers improved performance over score12 (p = 0.001). <b>(C)</b> FRED’s Chemgauss4 energy function performs at a similar level as score12_ligand (p = 0.239). All p-values are calculated by applying the Wilcoxon Signed-Rank test to the logs of the ranks (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#sec002" target="_blank">Methods</a>).</p

    Protein targets used in this study.

    No full text
    <p>These targets correspond to non-redundant protein-protein interaction sites for which a crystal structure or an NMR structure has been solved in complex with a small-molecule inhibitor. For this study, three protein targets have been removed from our previously reported set [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#pone.0140359.ref029" target="_blank">29</a>] (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#sec002" target="_blank">Methods</a>).</p

    Recent enhancements to the functional form of the Rosetta energy function enable improved performance.

    No full text
    <p><b>(A)</b> Rosetta’s “Talaris” energy function [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#pone.0140359.ref019" target="_blank">19</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#pone.0140359.ref020" target="_blank">20</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#pone.0140359.ref030" target="_blank">30</a>] includes updates to the functional form of the hydrogen bond term and of the electrostatic term. These changes lead to improved performance relative to score12, at a statistically significant level (p = 0.013). <b>(B)</b> Replacing Rosetta’s default model of polar solvation, EEF1, with a newly developed model, pwSHO [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0140359#pone.0140359.ref031" target="_blank">31</a>], leads to further improvement at a statistically significant level (p = 0.0004).</p
    corecore