295 research outputs found

    The protein common interface database (ProtCID)—a comprehensive database of interactions of homologous proteins in multiple crystal forms

    Get PDF
    The protein common interface database (ProtCID) is a database that contains clusters of similar homodimeric and heterodimeric interfaces observed in multiple crystal forms (CFs). Such interfaces, especially of homologous but non-identical proteins, have been associated with biologically relevant interactions. In ProtCID, protein chains in the protein data bank (PDB) are grouped based on their PFAM domain architectures. For a single PFAM architecture, all the dimers present in each CF are constructed and compared with those in other CFs that contain the same domain architecture. Interfaces occurring in two or more CFs comprise an interface cluster in the database. The same process is used to compare heterodimers of chains with different domain architectures. By examining interfaces that are shared by many homologous proteins in different CFs, we find that the PDB and the Protein Interfaces, Surfaces, and Assemblies (PISA) are not always consistent in their annotations of biological assemblies in a homologous family. Our data therefore provide an independent check on publicly available annotations of the structures of biological interactions for PDB entries. Common interfaces may also be useful in studies of protein evolution. Coordinates for all interfaces in a cluster are downloadable for further analysis. ProtCiD is available at http://dunbrack2.fccc.edu/protcid

    Simultaneous alignment and folding of protein sequences

    Get PDF
    Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We presentpartiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm’s complexity is polynomial in time and space. Algorithmically,partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments,partiFold-Align significantly outperforms state-of-the-art pairwise sequence alignment tools in the most difficult low sequence homology case and improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families. partiFold-Align is available at http://partiFold.csail.mit.edu

    Specialized dynamical properties of promiscuous residues revealed by simulated conformational ensembles

    Get PDF
    The ability to interact with different partners is one of the most important features in proteins. Proteins that bind a large number of partners (hubs) have been often associated with intrinsic disorder. However, many examples exist of hubs with an ordered structure, and evidence of a general mechanism promoting promiscuity in ordered proteins is still elusive. An intriguing hypothesis is that promiscuous binding sites have specific dynamical properties, distinct from the rest of the interface and pre-existing in the protein isolated state. Here, we present the first comprehensive study of the intrinsic dynamics of promiscuous residues in a large protein data set. Different computational methods, from coarse-grained elastic models to geometry-based sampling methods and to full-atom Molecular Dynamics simulations, were used to generate conformational ensembles for the isolated proteins. The flexibility and dynamic correlations of interface residues with a different degree of binding promiscuity were calculated and compared considering side chain and backbone motions, the latter both on a local and on a global scale. The study revealed that (a) promiscuous residues tend to be more flexible than nonpromiscuous ones, (b) this additional flexibility has a higher degree of organization, and (c) evolutionary conservation and binding promiscuity have opposite effects on intrinsic dynamics. Findings on simulated ensembles were also validated on ensembles of experimental structures extracted from the Protein Data Bank (PDB). Additionally, the low occurrence of single nucleotide polymorphisms observed for promiscuous residues indicated a tendency to preserve binding diversity at these positions. A case study on two ubiquitin-like proteins exemplifies how binding promiscuity in evolutionary related proteins can be modulated by the fine-tuning of the interface dynamics. The interplay between promiscuity and flexibility highlighted here can inspire new directions in protein-protein interaction prediction and design methods. © 2013 American Chemical Society

    Assessment of predicted enzymatic activity of α‐N‐acetylglucosaminidase variants of unknown significance for CAGI 2016

    Get PDF
    The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α‐N‐acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population‐scale analysis of disease epidemiology and rare variant association analysis

    A Generic Program for Multistate Protein Design

    Get PDF
    Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design

    A Computational Assay of Estrogen Receptor alpha Antagonists Reveals the Key Common Structural Traits of Drugs Effectively Fighting Refractory Breast Cancers

    Get PDF
    Somatic mutations of the Estrogen Receptor alpha (ER alpha) occur with an up to 40% incidence in ER sensitive breast cancer (BC) patients undergoing prolonged endocrine treatments. These polymorphisms are implicated in acquired resistance, disease relapse, and increased mortality rates, hence representing a current major clinical challenge. Here, multi-microseconds (12.5 mu s) molecular dynamics simulations revealed that recurrent ER alpha. polymorphisms (i.e. L536Q, Y5375, Y537N, D538G) (mER alpha) are constitutively active in their apo form and that they prompt the selection of an agonist (active)-like conformation even upon antagonists binding. Interestingly, our simulations rationalize, for thefirst time, the efficacy profile of (pre)clinically used Selective Estrogen Receptor Modulators/Downregulators (SERMs/SERDs) against these variants, enlightening, at atomistic level of detail, the key common structural traits needed by drugs able to effectively fight refractory BC types. This knowledge represents a key advancement for mechanism-based therapeutics targeting resistant ER alpha isoforms, potentially allowing the community to move a step closer to 'precision medicine' calibrated on patients' genetic profiles and disease progression

    Systematic evaluation of underlying defects in DNA repair as an approach to case-only assessment of familial prostate cancer

    Get PDF
    Risk assessment for prostate cancer is challenging due to its genetic heterogeneity. In this study, our goal was to develop an operational framework to select and evaluate gene variants that may contribute to familial prostate cancer risk. Drawing on orthogonal sources, we developed a candidate list of genes relevant to prostate cancer, then analyzed germline exomes from 12 case-only prostate cancer patients from high-risk families to identify patterns of protein-damaging gene variants. We described an average of 5 potentially disruptive variants in each individual and annotated them in the context of public databases representing human variation. Novel damaging variants were found in several genes of relevance to prostate cancer. Almost all patients had variants associated with defects in DNA damage response. Many also had variants linked to androgen signaling. Treatment of primary T-lymphocytes from these prostate cancer patients versus controls with DNA damaging agents showed elevated levels of the DNA double strand break (DSB) marker ?H2AX (p < 0.05), supporting the idea of an underlying defect in DNA repair. This work suggests the value of focusing on underlying defects in DNA damage in familial prostate cancer risk assessment and demonstrates an operational framework for exome sequencing in case-only prostate cancer genetic evaluation

    Algorithm for backrub motions in protein design

    Get PDF
    Motivation: The Backrub is a small but kinematically efficient side-chain-coupled local backbone motion frequently observed in atomic-resolution crystal structures of proteins. A backrub shifts the Cα–Cβ orientation of a given side-chain by rigid-body dipeptide rotation plus smaller individual rotations of the two peptides, with virtually no change in the rest of the protein. Backrubs can therefore provide a biophysically realistic model of local backbone flexibility for structure-based protein design. Previously, however, backrub motions were applied via manual interactive model-building, so their incorporation into a protein design algorithm (a simultaneous search over mutation and backbone/side-chain conformation space) was infeasible

    A Mathematical Framework for Protein Structure Comparison

    Get PDF
    Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set
    corecore