485 research outputs found

    Exploring Conformational Landscapes and Cryptic Binding Pockets in Distinct Functional States of the SARS-CoV-2 Omicron BA.1 and BA.2 Trimers: Mutation-Induced Modulation of Protein Dynamics and Network-Guided Prediction of Variant-Specific Allosteric Binding Sites

    Get PDF
    A significant body of experimental structures of SARS-CoV-2 spike trimers for the BA.1 and BA.2 variants revealed a considerable plasticity of the spike protein and the emergence of druggable binding pockets. Understanding the interplay of conformational dynamics changes induced by the Omicron variants and the identification of cryptic dynamic binding pockets in the S protein is of paramount importance as exploring broad-spectrum antiviral agents to combat the emerging variants is imperative. In the current study, we explore conformational landscapes and characterize the universe of binding pockets in multiple open and closed functional spike states of the BA.1 and BA.2 Omicron variants. By using a combination of atomistic simulations, a dynamics network analysis, and an allostery-guided network screening of binding pockets in the conformational ensembles of the BA.1 and BA.2 spike conformations, we identified all experimentally known allosteric sites and discovered significant variant-specific differences in the distribution of binding sites in the BA.1 and BA.2 trimers. This study provided a structural characterization of the predicted cryptic pockets and captured the experimentally known allosteric sites, revealing the critical role of conformational plasticity in modulating the distribution and cross-talk between functional binding sites. We found that mutational and dynamic changes in the BA.1 variant can induce the remodeling and stabilization of a known druggable pocket in the N-terminal domain, while this pocket is drastically altered and may no longer be available for ligand binding in the BA.2 variant. Our results predicted the experimentally known allosteric site in the receptor-binding domain that remains stable and ranks as the most favorable site in the conformational ensembles of the BA.2 variant but could become fragmented and less probable in BA.1 conformations. We also uncovered several cryptic pockets formed at the inter-domain and inter-protomer interface, including functional regions of the S2 subunit and stem helix region, which are consistent with the known role of pocket residues in modulating conformational transitions and antibody recognition. The results of this study are particularly significant for understanding the dynamic and network features of the universe of available binding pockets in spike proteins, as well as the effects of the Omicron-variant-specific modulation of preferential druggable pockets. The exploration of predicted druggable sites can present a new and previously underappreciated opportunity for therapeutic interventions for Omicron variants through the conformation-selective and variant-specific targeting of functional sites involved in allosteric changes

    SyNDock: N Rigid Protein Docking via Learnable Group Synchronization

    Full text link
    The regulation of various cellular processes heavily relies on the protein complexes within a living cell, necessitating a comprehensive understanding of their three-dimensional structures to elucidate the underlying mechanisms. While neural docking techniques have exhibited promising outcomes in binary protein docking, the application of advanced neural architectures to multimeric protein docking remains uncertain. This study introduces SyNDock, an automated framework that swiftly assembles precise multimeric complexes within seconds, showcasing performance that can potentially surpass or be on par with recent advanced approaches. SyNDock possesses several appealing advantages not present in previous approaches. Firstly, SyNDock formulates multimeric protein docking as a problem of learning global transformations to holistically depict the placement of chain units of a complex, enabling a learning-centric solution. Secondly, SyNDock proposes a trainable two-step SE(3) algorithm, involving initial pairwise transformation and confidence estimation, followed by global transformation synchronization. This enables effective learning for assembling the complex in a globally consistent manner. Lastly, extensive experiments conducted on our proposed benchmark dataset demonstrate that SyNDock outperforms existing docking software in crucial performance metrics, including accuracy and runtime. For instance, it achieves a 4.5% improvement in performance and a remarkable millionfold acceleration in speed

    Protein complexes in cells by AI-assisted structural proteomics

    Get PDF
    Abstract Accurately modeling the structures of proteins and their complexes using artificial intelligence is revolutionizing molecular biology. Experimental data enable a candidate‐based approach to systematically model novel protein assemblies. Here, we use a combination of in‐cell crosslinking mass spectrometry and co‐fractionation mass spectrometry (CoFrac‐MS) to identify protein–protein interactions in the model Gram‐positive bacterium Bacillus subtilis. We show that crosslinking interactions prior to cell lysis reveals protein interactions that are often lost upon cell lysis. We predict the structures of these protein interactions and others in the SubtiWiki database with AlphaFold‐Multimer and, after controlling for the false‐positive rate of the predictions, we propose novel structural models of 153 dimeric and 14 trimeric protein assemblies. Crosslinking MS data independently validates the AlphaFold predictions and scoring. We report and validate novel interactors of central cellular machineries that include the ribosome, RNA polymerase, and pyruvate dehydrogenase, assigning function to several uncharacterized proteins. Our approach uncovers protein–protein interactions inside intact cells, provides structural insight into their interaction interfaces, and is applicable to genetically intractable organisms, including pathogenic bacteria

    What is hidden in the darkness? Deep-learning assisted large-scale protein family curation uncovers novel protein families and folds

    Get PDF
    Driven by the development and upscaling of fast genome sequencing and assembly pipelines, the number of protein-coding sequences deposited in public protein sequence databases is increasing exponentially. Recently, the dramatic success of deep learning-based approaches applied to protein structure prediction has done the same for protein structures. We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover most of the catalogued natural proteins, including those difficult to annotate for function or putative biological role based on standard, homology-based approaches. In this work, we quantified how much of such "dark matter" of the natural protein universe was structurally illuminated by AlphaFold2 and modelled this diversity as an interactive sequence similarity network that can be navigated at https://uniprot3d.org/atlas/AFDB90v4 . In the process, we discovered multiple novel protein families by searching for novelties from sequence, structure, and semantic perspectives. We added a number of them to Pfam, and experimentally demonstrate that one of these belongs to a novel superfamily of toxin-antitoxin systems, TumE-TumA. This work highlights the role of large-scale, evolution-driven protein comparison efforts in combination with structural similarities, genomic context conservation, and deep-learning based function prediction tools for the identification of novel protein families, aiding not only annotation and classification efforts but also the curation and prioritisation of target proteins for experimental characterisation

    3D Convolutional Neural Networks for Identifying Protein Interfaces

    Get PDF
    Protein interaction is a fundamental part of nearly all biochemical processes and proteins evolved specific surface regions for molecular recognition and interaction. These regions are different from the remaining surface, with different amino acid compositions, geometry and chemical properties. Detecting protein interfaces can lead to a better understanding of protein interactions granting advantages to fields such as drug design and metabolic engineering. Most of the existing interface predictors use structured data, clearly defined data types usually obtained from data sets. However, proteins are very complex molecules and there is not a single property capable of distinguishing the interface from the rest of the protein surface to all types of proteins. Indeed, deep learning arises as an adequate approach able to capture feature from unstructured data as images, texts, sensor data and volumes. In here, the aim was to identify interface regions in known protein spatial structures together with their biochemical properties by exploring new applications of 3D convolutional neural networks. For this, some state-of-the-art convolutional neural networks architectures were explored in order to find an architecture that suits this problem, and even more, have good performance. Other state-of-the-art machine learning predictors are also considered to identify the best biochemical properties to be added as new channels. Afterward, the interface predictions will be compared with the ground-truth, obtained by calculating the distances of atoms between the different chains of the protein complexes.A interação entre proteínas é fundamental em todos os processos biológicos e bioquímicos. As proteínas são compostas por regiões específicas que permitem o reconhecimento molecular e, consequentemente, interações com outras moléculas. Normalmente, estas regiões são estruturalmente diferentes da restante molécula sendo caracterizadas e compostas por aminoácidos diferentes, propriedades químicas e geometria diversa. A detecção das interfaces das proteínas pode ser uma mais valia no contexto de perceber a interação entre as mesmas e consecutivamente, ser vantajoso para o design de novos fármacos (ou drug design) e engenharia metabólica. As previsões de interfaces usam maioritariamente dados estruturados, ou seja, dados bem definidos normalmente obtidos em bancos de dados. No entanto, as proteínas são moléculas complexas o que impossibilita a distinção da sua interface, uma vez que não existe uma propriedade única e específica para todas. Deste modo, o deep learning é uma ferramenta fundamental porque usa características de dados não estruturados, como por exemplo a informação espacial da proteína, imagens, textos, dados de sensores ou volumes. O objetivo principal deste projeto é identificar regiões de interfaces através de estruturas tri-dimensionais de proteínas conhecidas juntamente com as respetivas distribuição espacial das suas propriedades, usando redes neuronais de convolução. Neste trabalho foram estudados algoritmos de deep learning para encontrar a rede neuronal mais adequada ao problema que pretendemos resolver com o melhor desempenho. Outros algoritmos de previsão foram considerados para identificar quais as melhores propriedades bioquímicas a serem usadas como novos canais de input. Seguidamente, as previsões do modelo foram comparadas com as interfaces reais, que foram obtidas pelo cálculo das distâncias dos átomos entre cadeias diferentes do mesmo complexo

    Frustration in Biomolecules

    Get PDF
    Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with a finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of "frustration" in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and how structure connects to function. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding, how a large part of the biological functions of proteins are related to subtle local frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. We hope to illustrate how Frustration is a fundamental concept in relating function to structural biology.Comment: 97 pages, 30 figure

    DEEP LEARNING METHODS FOR PREDICTION OF AND ESCAPE FROM PROTEIN RECOGNITION

    Get PDF
    Protein interactions drive diverse processes essential to living organisms, and thus numerous biomedical applications center on understanding, predicting, and designing how proteins recognize their partners. While unfortunately the number of interactions of interest still vastly exceeds the capabilities of experimental determination methods, computational methods promise to fill the gap. My thesis pursues the development and application of computational methods for several protein interaction prediction and design tasks. First, to improve protein-glycan interaction specificity prediction, I developed GlyBERT, which learns biologically relevant glycan representations encapsulating the components most important for glycan recognition within their structures. GlyBERT encodes glycans with a branched biochemical language and employs an attention-based deep language model to embed the correlation between local and global structural contexts. This approach enables the development of predictive models from limited data, supporting applications such as lectin binding prediction. Second, to improve protein-protein interaction prediction, I developed a unified geometric deep neural network, ‘PInet’ (Protein Interface Network), which leverages the best properties of both data- and physics-driven methods, learning and utilizing models capturing both geometrical and physicochemical molecular surface complementarity. In addition to obtaining state-of-the-art performance in predicting protein-protein interactions, PInet can serve as the backbone for other protein-protein interaction modeling tasks such as binding affinity prediction. Finally, I turned from ii prediction to design, addressing two important tasks in the context of antibodyantigen recognition. The first problem is to redesign a given antigen to evade antibody recognition, e.g., to help biotherapeutics avoid pre-existing immunity or to focus vaccine responses on key portions of an antigen. The second problem is to design a panel of variants of a given antigen to use as “bait” in experimental identification of antibodies that recognize different parts of the antigen, e.g., to support classification of immune responses or to help select among different antibody candidates. I developed a geometry-based algorithm to generate variants to address these design problems, seeking to maximize utility subject to experimental constraints. During the design process, the algorithm accounts for and balances the effects of candidate mutations on antibody recognition and on antigen stability. In retrospective case studies, the algorithm demonstrated promising precision, recall, and robustness of finding good designs. This work represents the first algorithm to systematically design antigen variants for characterization and evasion of polyclonal antibody responses

    Insights to Protein Pathogenicity from the Lens of Protein Evolution

    Get PDF
    As protein sequences evolve, differences in selective constraints may lead to outcomes ranging from sequence conservation to structural and functional divergence. Evolutionary protein family analysis can illuminate which protein regions are likely to diverge or remain conserved in sequence, structure, and function. Moreover, nonsynonymous mutations in pathogens may result in the emergence of protein regions that affect the behavior of pathogenic proteins within a host and host response. I aimed to gain insight on pathogenic proteins from cancer and viruses using an evolutionary perspective. First, I examined p53, a conformationally flexible, multifunctional protein mutated in ~50% of human cancers. Multifunctional proteins may experience rapid sequence divergence given trade-offs between functions, while proteins with important functions may be more constrained. How, then, does a protein like p53 evolve? I assessed the evolutionary dynamics of structural and regulatory properties in the p53 family, revealing paralog-specific patterns of functional divergence. I also studied flaviviruses, like Dengue and Zika virus, whose conformational flexibility contributes to antibody-dependent enhancement (ADE). ADE has long complicated vaccine development for these viruses, making antiviral drug development an attractive alternative. I identified fitness-critical sites conserved in sequence and structure in the proteome of flaviviruses with the potential to act as broadly neutralizing antiviral drug target sites. I later developed Epitopedia, a computational method for epitope-based prediction of molecular mimicry. Molecular mimicry occurs when regions of antigenic proteins resemble protein regions from the host or other pathogens, leading to antibody cross-reactivity at these sites which can result in autoimmunity or have a protective effect. I applied Epitopedia to the antigenic Spike protein from SARS-CoV-2, the causative agent of COVID-19. Molecular mimicry may explain the varied symptoms and outcomes seen in COVID-19 patients. I found instances of molecular mimicry in Spike associated with COVID-19-related blood-clotting disorders and cardiac disease, with implications on disease treatment and vaccine design

    Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements

    Get PDF
    Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)

    A structural biology community assessment of AlphaFold2 applications

    Get PDF
    Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research
    corecore