130 research outputs found
DeepRank-GNN-esm: A graph neural network for scoring protein-protein models using protein language model
Motivation: Protein-Protein interactions (PPIs) play critical roles in numerous cellular processes. By modelling the 3D structures of the correspond protein complexes valuable insights can be obtained, providing, e.g. starting points for drug and protein design. One challenge in the modelling process is however the identification of near-native models from the large pool of generated models. To this end we have previously developed DeepRank-GNN, a graph neural network that integrates structural and sequence information to enable effective pattern learning at PPI interfaces. Its main features are related to the Position Specific Scoring Matrices (PSSMs), which are computationally expensive to generate, significantly limits the algorithm's usability. Results: We introduce here DeepRank-GNN-esm that includes as additional features protein language model embeddings from the ESM-2 model. We show that the ESM-2 embeddings can actually replace the PSSM features at no cost in-, or even better performance on two PPI-related tasks: scoring docking poses and detecting crystal artifacts. This new DeepRank version bypasses thus the need of generating PSSM, greatly improving the usability of the software and opening new application opportunities for systems for which PSSM profiles cannot be obtained or are irrelevant (e.g. antibody-antigen complexes)
Improving the Quality of Co-evolution Intermolecular Contact Prediction with DisVis
The steep rise in available protein sequences and structures has paved the way for bioinformatics approaches to predict residue-residue interactions in protein complexes. Multiple sequence alignments are commonly used in intermolecular contact predictions to identify co-evolving residues. These contacts, however, often include false positives (FPs), which may impair their use to predict three dimensional structures of biomolecular complexes and affect the accuracy of the generated models. Previously, we have developed DisVis to identify false positive data in mass spectrometry cross-linking data. DisVis allows to assess the accessible interaction space between two proteins consistent with a set of distance restraints. Here, we investigate if a similar approach could be applied to co-evolution predicted contacts in order to improve their precision prior to using them for modelling complexes. In this work we analyze co-evolution contact predictions with DisVis in order to identify putative FPs for a set of 26 protein-protein complexes. Next, the DisVis-reranked and the original co-evolution contacts are used to model the complexes with our integrative docking software HADDOCK using different filtering scenarios. Our results show that HADDOCK is robust with respect to the precision of the predicted contacts due to the 50% random contact removal during docking and using DisVis filtering for low precision contact data. DisVis can thus have a beneficial effect on low quality data, but overall HADDOCK can accommodate FP restraints without negatively impacting the quality of the resulting models. Other more precision-sensitive docking protocols might, however, benefit from the increased precision of the predicted contacts after DisVis filtering
Novel insights into guide RNA 5ʹ-Nucleoside/Tide binding by human argonaute 2
The human Argonaute 2 (hAgo2) protein is a key player of RNA interference (RNAi). Upon complex formation with small non-coding RNAs, the protein initially interacts with the 51-end of a given guide RNA through multiple interactions within the MID domain. This interaction has been reported to show a strong bias for U and A over C and G at the 5ʹ-position. Performing molecular dynamics simulations of binary hAgo2/OH–guide–RNA complexes, we show that hAgo2 is a highly flexible protein capable of binding to guide strands with all four possible 51-bases. Especially, in the case of C and G this is associated with rather large individual conformational rearrangements affecting the MID, PAZ and even the N-terminal domains to different degrees. Moreover, a 5ʹ-G induces domain motions in the protein, which trigger a previously unreported interaction between the 51-base and the L2 linker domain. Combining our in silico analyses with biochemical studies of recombinant hAgo2, we find that, contrary to previous observations, hAgo2 is capable of functionally accommodating guide strands regardless of the 5ʹ-base
Using machine-learning-driven approaches to boost hot-spot's knowledge
Understanding protein–protein interactions (PPIs) is fundamental to describe and to characterize the formation of biomolecular assemblies, and to establish the energetic principles underlying biological networks. One key aspect of these interfaces is the existence and prevalence of hot-spots (HS) residues that, upon mutation to alanine, negatively impact the formation of such protein–protein complexes. HS have been widely considered in research, both in case studies and in a few large-scale predictive approaches. This review aims to present the current knowledge on PPIs, providing a detailed understanding of the microspecifications of the residues involved in those interactions and the characteristics of those defined as HS through a thorough assessment of related field-specific methodologies. We explore recent accurate artificial intelligence-based techniques, which are progressively replacing well-established classical energy-based methodologies. This article is categorized under: Data Science > Databases and Expert Systems Structure and Mechanism > Computational Biochemistry and Biophysics Molecular and Statistical Mechanics > Molecular Interactions
Cyclization and Docking Protocol for Cyclic Peptide-Protein Modeling Using HADDOCK2.4
An emerging class of therapeutic molecules are cyclic peptides with over 40 cyclic peptide drugs currently in clinical use. Their mode of action is, however, not fully understood, impeding rational drug design. Computational techniques could positively impact their design, but modeling them and their interactions remains challenging due to their cyclic nature and their flexibility. This study presents a step-by-step protocol for generating cyclic peptide conformations and docking them to their protein target using HADDOCK2.4. A dataset of 30 cyclic peptide-protein complexes was used to optimize both cyclization and docking protocols. It supports peptides cyclized via an N- and C-terminus peptide bond and/or a disulfide bond. An ensemble of cyclic peptide conformations is then used in HADDOCK to dock them onto their target protein using knowledge of the binding site on the protein side to drive the modeling. The presented protocol predicts at least one acceptable model according to the critical assessment of prediction of interaction criteria for each complex of the dataset when the top 10 HADDOCK-ranked single structures are considered (100% success rate top 10) both in the bound and unbound docking scenarios. Moreover, its performance in both bound and fully unbound docking is similar to the state-of-the-art software in the field, Autodock CrankPep. The presented cyclization and docking protocol should make HADDOCK a valuable tool for rational cyclic peptide-based drug design and high-throughput screening
ARCTIC-3D: automatic retrieval and clustering of interfaces in complexes from 3D structural information
The formation of a stable complex between proteins lies at the core of a wide variety of biological processes and has been the focus of countless experiments. The huge amount of information contained in the protein structural interactome in the Protein Data Bank can now be used to characterise and classify the existing biological interfaces. We here introduce ARCTIC-3D, a fast and user-friendly data mining and clustering software to retrieve data and rationalise the interface information associated with the protein input data. We demonstrate its use by various examples ranging from showing the increased interaction complexity of eukaryotic proteins, 20% of which on average have more than 3 different interfaces compared to only 10% for prokaryotes, to associating different functions to different interfaces. In the context of modelling biomolecular assemblies, we introduce the concept of “recognition entropy”, related to the number of possible interfaces of the components of a protein-protein complex, which we demonstrate to correlate with the modelling difficulty in classical docking approaches. The identified interface clusters can also be used to generate various combinations of interface-specific restraints for integrative modelling. The ARCTIC-3D software is freely available at github.com/haddocking/arctic3d and can be accessed as a web-service at wenmr.science.uu.nl/arctic3d
ARCTIC-3D: Automatic Retrieval and ClusTering of Interfaces in Complexes from 3D structural information
The formation of a stable complex between proteins lies at the core of a wide variety of biological processes and has been the focus of countless experiments. The huge amount of information contained in the protein structural interactome in the Protein Data Bank can now be used to characterise and classify the existing biological interfaces. We here introduce ARCTIC-3D, a fast and user-friendly data mining and clustering software to retrieve data and rationalise the interface information associated with the protein input data. We demonstrate its use by various examples ranging from showing the increased interaction complexity of eukaryotic proteins, 20% of which on average have more than 3 different interfaces compared to only 10% for prokaryotes, to associating different functions to different interfaces. In the context of modelling biomolecular assemblies, we introduce the concept of “recognition entropy”, related to the number of possible interfaces of the components of a protein-protein complex, which we demonstrate to correlate with the modelling difficulty. The identified interface clusters can also be used to generate various combinations of interface-specific restraints for integrative modelling. The ARCTIC-3D software is freely available at https://github.com/haddocking/arctic3d and can be accessed as a web-service at https://wenmr.science.uu.nl/arctic-3
Towards the accurate modelling of antibody-antigen complexes from sequence using machine learning and information-driven docking
Antibody-antigen complex modelling is an important step in computational workflows for therapeutic antibody design. While experimentally determined structures of both antibody and the cognate antigen are often not available, recent advances in machine learning-driven protein modelling have enabled accurate prediction of both antibody and antigen structures. Here, we analyse the ability of protein-protein docking tools to use machine learning generated input structures for information-driven docking. We find that HADDOCK can generate accurate models of antibodyantigen complexes using an ensemble of antibody structures generated by machine learning tools and AlphaFold2 predicted antigen structures. Targeted docking using knowledge of the complementary determining regions on the antibody and some information about the targeted epitope allows the generation of high quality models of the complex with reduced sampling, resulting in a computationally cheap protocol that outperforms the ZDOCK baseline. The data set used to benchmark the docking protocols in this study is available at github.com/haddocking/ai-antibodies. The docking models will be deposited at data.sbgrid.org/labs/32/ upon acceptance
Molecular Insights Into Binding and Activation of the Human KCNQ2 Channel by Retigabine
Voltage-gated potassium channels of the Kv7.x family are involved in a plethora of biological processes across many tissues in animals, and their misfunctioning could lead to several pathologies ranging from diseases caused by neuronal hyperexcitability, such as epilepsy, or traumatic injuries and painful diabetic neuropathy to autoimmune disorders. Among the members of this family, the Kv7.2 channel can form hetero-tetramers together with Kv7.3, forming the so-called M-channels, which are primary regulators of intrinsic electrical properties of neurons and of their responsiveness to synaptic inputs. Here, prompted by the similarity between the M-current and that in Kv7.2 alone, we perform a computational-based characterization of this channel in its different conformational states and in complex with the modulator retigabine. After validation of the structural models of the channel by comparison with experimental data, we investigate the effect of retigabine binding on the two extreme states of Kv7.2 (resting-closed and activated-open). Our results suggest that binding, so far structurally characterized only in the intermediate activated-closed state, is possible also in the other two functional states. Moreover, we show that some effects of this binding, such as increased flexibility of voltage sensing domains and propensity of the pore for open conformations, are virtually independent on the conformational state of the protein. Overall, our results provide new structural and dynamic insights into the functioning and the modulation of Kv7.2 and related channels
Unveiling the interaction of vanadium compounds with human serum albumin by using 1H STD NMR and computational docking studies
The binding of the VV oxidation products of two vanadium(IV) compounds, [VO(dmpp)2] and [VO(maltolato)2], which have shown promising anti-diabetic properties, to human serum albumin (HSA) in aqueous aerobic solution has been studied by 1H saturation transfer difference (STD) NMR spectroscopy and computational docking studies. Group epitope mapping and docking simulations indicate a preference of HSA binding to the 1:1 [VO2(dmpp)(OH)(H2O)]- and 1:2 [VO 2(maltol)2]- vanadium(V) species. By using known HSA binders, competition NMR experiments revealed that both complexes preferentially bind to drug site I. Docking simulations carried out with HADDOCK together with restraints derived from the STD results led to three-dimensional models that are in agreement with the NMR spectroscopic data, providing useful information on molecular interaction modes. These results indicate that the combination of STD NMR and data-driven docking is a good tool for elucidating the interactions in protein-vanadium compounds and thus for clarifying the mechanism of drug delivery as vanadium compounds have shown potential therapeutic properties. 1H STD NMR analysis complemented by HADDOCK studies have revealed that the [VO2(dmpp)(H2O)(OH)] - species, resulting from the oxidation of the potential insulin mimetic VO(dmpp)2, binds preferentially to HSA site I. These findings corroborate the involvement of this serum protein in the transport of vanadium species in the blood stream and their delivery to target cells
- …