155 research outputs found

    PISCES: recent improvements to a PDB sequence culling server

    Get PDF
    PISCES is a database server for producing lists of sequences from the Protein Data Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence identity. Our goal in culling the PDB is to provide the longest list possible of the highest resolution structures that fulfill the sequence identity and structural quality cut-offs. The new PISCES server uses a combination of PSI-BLAST and structure-based alignments to determine sequence identities. Structure alignment produces more complete alignments and therefore more accurate sequence identities than PSI-BLAST. PISCES now allows a user to cull the PDB by-entry in addition to the standard culling by individual chains. In this scenario, a list will contain only entries that do not have a chain that has a sequence identity to any chain in any other entry in the list over the sequence identity cut-off. PISCES also provides fully annotated sequences including gene name and species. The server allows a user to cull an input list of entries or chains, so that other criteria, such as function, can be used. Results from a search on the re-engineered RCSB's site for the PDB can be entered into the PISCES server by a single click, combining the powerful searching abilities of the PDB with PISCES's utilities for sequence culling. The server's data are updated weekly. The server is available at

    Charge Asymmetry in the Proteins of the Outer Membrane

    Get PDF

    Accurate Structural Correlations from Maximum Likelihood Superpositions

    Get PDF
    The cores of globular proteins are densely packed, resulting in complicated networks of structural interactions. These interactions in turn give rise to dynamic structural correlations over a wide range of time scales. Accurate analysis of these complex correlations is crucial for understanding biomolecular mechanisms and for relating structure to function. Here we report a highly accurate technique for inferring the major modes of structural correlation in macromolecules using likelihood-based statistical analysis of sets of structures. This method is generally applicable to any ensemble of related molecules, including families of nuclear magnetic resonance (NMR) models, different crystal forms of a protein, and structural alignments of homologous proteins, as well as molecular dynamics trajectories. Dominant modes of structural correlation are determined using principal components analysis (PCA) of the maximum likelihood estimate of the correlation matrix. The correlations we identify are inherently independent of the statistical uncertainty and dynamic heterogeneity associated with the structural coordinates. We additionally present an easily interpretable method (β€œPCA plots”) for displaying these positional correlations by color-coding them onto a macromolecular structure. Maximum likelihood PCA of structural superpositions, and the structural PCA plots that illustrate the results, will facilitate the accurate determination of dynamic structural correlations analyzed in diverse fields of structural biology

    Matt: Local Flexibility Aids Protein Multiple Structure Alignment

    Get PDF
    Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these β€œbent” alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of Ξ±-helices and Ξ²-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins

    Rapid calcium-dependent activation of Aurora-A kinase

    Get PDF
    Oncogenic hyperactivation of the mitotic kinase Aurora-A (AurA) in cancer is associated with genomic instability. Increasing evidence indicates that AurA also regulates critical processes in normal interphase cells, but the source of such activity has been obscure. We report here that multiple stimuli causing release of Ca2+ from intracellular endoplasmic reticulum stores rapidly and transiently activate AurA, without requirement for second messengers. This activation is mediated by direct Ca2+-dependent calmodulin (CaM) binding to multiple motifs on AurA. On the basis of structure–function analysis and molecular modelling, we map two primary regions of CaM-AurA interaction to unfolded sequences in the AurA N- and C-termini. This unexpected mechanism for AurA activation provides a new context for evaluating the function of AurA and its inhibitors in normal and cancerous cells

    Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

    Get PDF
    Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp

    Exploring Co-occurring POLE Exonuclease and Non-exonuclease Domain Mutations and Their Impact on Tumor Mutagenicity

    Full text link
    POLE driver mutations in the exonuclease domain (ExoD driver) are prevalent in several cancers, including colorectal cancer and endometrial cancer, leading to dramatically ultra-high tumor mutation burden (TMB). To understand whether POLE mutations that are not classified as drivers (POLE Variant) contribute to mutagenesis, we assessed TMB in 447 POLE-mutated colorectal cancers, endometrial cancers, and ovarian cancers classified as TMB-high >= 10 mutations/Mb (mut/Mb) or TMB-low <10 mut/Mb. TMB was significantly highest in tumors with POLE ExoD driver plus POLE Variant (colorectal cancer and endometrial cancer, P < 0.001; ovarian cancer, P < 0.05). TMB increased with additional POLE variants (P < 0.001), but plateaued at 2, suggesting an association between the presence of these variants and TMB. Integrated analysis of AlphaFold2 POLE models and quantitative stability estimates predicted the impact of multiple POLE variants on POLE functionality. The prevalence of immunogenic neoepitopes was notably higher in the POLE ExoD driver plus POLE Variant tumors. Overall, this study reveals a novel correlation between POLE variants in POLE ExoD-driven tumors, and ultra-high TMB. Currently, only select pathogenic ExoD mutations with a reliable association with ultra-high TMB inform clinical practice. Thus, these findings are hypothesis-generating, require functional validation, and could potentially inform tumor classification, treatment responses, and clinical outcomes. Significance: Somatic POLE ExoD driver mutations cause proofreading deficiency that induces high TMB. This study suggests a novel modifier role for POLE variants in POLE ExoD-driven tumors, associated with ultra-high TMB. These data, in addition to future functional studies, may inform tumor classification, therapeutic response, and patient outcomes

    Lipid Exchange Mechanism of the Cholesteryl Ester Transfer Protein Clarified by Atomistic and Coarse-grained Simulations

    Get PDF
    Cholesteryl ester transfer protein (CETP) transports cholesteryl esters, triglycerides, and phospholipids between different lipoprotein fractions in blood plasma. The inhibition of CETP has been shown to be a sound strategy to prevent and treat the development of coronary heart disease. We employed molecular dynamics simulations to unravel the mechanisms associated with the CETP-mediated lipid exchange. To this end we used both atomistic and coarse-grained models whose results were consistent with each other. We found CETP to bind to the surface of high density lipoprotein (HDL) -like lipid droplets through its charged and tryptophan residues. Upon binding, CETP rapidly (in about 10 ns) induced the formation of a small hydrophobic patch to the phospholipid surface of the droplet, opening a route from the core of the lipid droplet to the binding pocket of CETP. This was followed by a conformational change of helix X of CETP to an open state, in which we found the accessibility of cholesteryl esters to the C-terminal tunnel opening of CETP to increase. Furthermore, in the absence of helix X, cholesteryl esters rapidly diffused into CETP through the C-terminal opening. The results provide compelling evidence that helix X acts as a lid which conducts lipid exchange by alternating the open and closed states. The findings have potential for the design of novel molecular agents to inhibit the activity of CETP

    Candidate Variants in DNA Replication and Repair Genes in Early-Onset Renal Cell Carcinoma Patients Referred for Germline Testing

    Get PDF
    Background: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. Methods: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. Results: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol Ξ΄ and Pol Ξ· polymerases revealed defective enzymatic activities. Conclusions: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC
    • …
    corecore