78 research outputs found

    A geometric knowledge-based coarse-grained scoring potential for structure prediction evaluation

    Get PDF
    International audienceKnowledge-based protein folding potentials have proven successful in the recent years. Based on statistics of observed interatomic distances, they generally encode pairwise contact information. In this study we present a method that derives multi-body contact potentials from measurements of surface areas using coarse-grained protein models. The measurements are made using a newly implemented geometric construction: the arrangement of circles on a sphere. This construction allows the definition of residue covering areas which are used as parameters to build functions able to distinguish native structures from decoys. These functions, encoding up to 5-body contacts are evaluated on a reference set of 66 structures and its 45000 decoys, and also on the often used lattice ssfit set from the decoys'R us database. We show that the most relevant information for discrimination resides in 2- and 3-body contacts. The potentials we have obtained can be used for evaluation of putative structural models; they could also lead to different types of structure refinement techniques that use multi-body interactions

    Characterizing RNA ensembles from NMR data with kinematic models

    Get PDF
    International audienceFunctional mechanisms of biomolecules often manifest themselves precisely in transient conformational substates. Researchers have long sought to structurally characterize dynamic processes in non-coding RNA, combining experimental data with computer algorithms. However, adequate exploration of conformational space for these highly dynamic molecules, starting from static crystal structures, remains challenging. Here, we report a new conformational sampling procedure, KGSrna, which can efficiently probe the native ensemble of RNA molecules in solution. We found that KGSrna ensembles accurately represent the conformational landscapes of 3D RNA encoded by NMR proton chemical shifts. KGSrna resolves motionally averaged NMR data into structural contributions; when coupled with residual dipolar coupling data, a KGSrna ensemble revealed a previously uncharacterized transient excited state of the HIV-1 trans-activation response element stem-loop. Ensemble-based interpretations of averaged data can aid in formulating and testing dynamic, motion-based hypotheses of functional mechanisms in RNAs with broad implications for RNA engineering and therapeutic intervention

    A Collaborative Filtering Approach for Protein-Protein Docking Scoring Functions

    Get PDF
    A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions

    Community-Wide Assessment of Protein-Interface Modeling Suggests Improvements to Design Methodology

    Get PDF
    The CAPRI and CASP prediction experiments have demonstrated the power of community wide tests of methodology in assessing the current state of the art and spurring progress in the very challenging areas of protein docking and structure prediction. We sought to bring the power of community wide experiments to bear on a very challenging protein design problem that provides a complementary but equally fundamental test of current understanding of protein-binding thermodynamics. We have generated a number of designed protein-protein interfaces with very favorable computed binding energies but which do not appear to be formed in experiments, suggesting there may be important physical chemistry missing in the energy calculations. 28 research groups took up the challenge of determining what is missing: we provided structures of 87 designed complexes and 120 naturally occurring complexes and asked participants to identify energetic contributions and/or structural features that distinguish between the two sets. The community found that electrostatics and solvation terms partially distinguish the designs from the natural complexes, largely due to the non-polar character of the designed interactions. Beyond this polarity difference, the community found that the designed binding surfaces were on average structurally less embedded in the designed monomers, suggesting that backbone conformational rigidity at the designed surface is important for realization of the designed function. These results can be used to improve computational design strategies, but there is still much to be learned; for example, one designed complex, which does form in experiments, was classified by all metrics as a non-binder

    Méthodes géométriques et statistiques pour l'analyse et la prédiction des interactions structurales de biomolécules

    No full text
    The biological function of macromolecules, such as proteins and nucleic acids, relies heavily on their interactions with their partners. The prediction of how molecules interact and how they can create large assemblies acting as nanomachines is essential for our understanding of biology but also for therapeutics and nanotechnology design.Blind challenges in biology, such as the CAPRI worldwide experiment for docking, have shown that in silico studies and simulations, mainly using physics-based potentials and techniques, could give structural insights in atomic detail. They might however be of very limited accuracy, particularly in predicting the native molecular structure of proteins, RNAs and complexes.Resorting to simple geometric coarse-grained modelling and machine learning strategies, such as genetic algorithms and support vector machines, we have shown that scoring the putative complex structures can be very much improved to reach the accuracy needed for experiment design and analysis, at least in a semi-rigid body context. From that proof of concept studies, most of the prediction strategies for docking now use machine learning for scoring optimization.Being able to predict the structure and the way molecular partners deform upon binding is also key to obtain better predictions, in particular for non-coding RNAs that are essential to target oncogenes. Our efforts in RNA structure prediction techniques have shown that data based parameterization of energy functions and statistical techniques largely improve the accuracy of structure prediction.Reaching the large assemblies stage also requires to be able to assess the dynamics of molecules from partial experimental data. We developed an efficient sampling technique based on inverse kinematics that does not rely on constraint counting, and implicitly calculates the rigidity of the molecule. Paired with experimental data, it offers an integrative view of the dynamics of non-coding RNAs for biological processes. Combined with clustering techniques, it allows for efficient and flexible cross-docking analysis for protein-RNA complexes

    Utilisation de la tessellation de Voronoï pour l'étude des complexes protéine-protéine

    No full text
    The function of a protein is often subordinated to its interaction with one or many partners. Yet, the tridimensional structure study of this complexes, that can't be done experimentally, would permit the understanding of many cellular processes. This work contains two parts. The first part concerns the setting up of a scoring function for protein-protein docking and the second part concerns the crystallographic structure study of a tetrameric protein : the Paramecium Bursaria Chlorella Virus thymidylate synthase X, a potential antibacterial target. Docking of protein-protein complexes consists in two successive steps : first a large number of putative conformations are generated, then a scoring function is applied to rank them. This scoring function has to take into account both geometric complementarity of the two molecules and physico-chemical properties of surfaces in interaction. We addressed the second step of this problem through the development of a quick and reliable scoring function. This was done using Voronoi tessellation of the tridimensional structure of the proteins. Voronoi or Laguerre tessellations were shown to be good mathematical models of protein structure. In particular, this formalization leads to a good description of structural properties of the residues. This modeling illustrates the packing of the residues at the interface between two proteins. Thus, it is possible to measure a set of parameters, on protein-protein complexes whose structure is known, and on decoys. These parameters are frequencies of residues and pair frequencies of the residues at the interface, volumes of Voronoi cells, distances between residues at the interface, interface area and number of residues at the interface. They were used as input in statistical machine learning procedures (logistic learning, support vector machines (SVM) and genetic algorithms). These led to efficient scoring functions, able to separate native structures from decoys. In the second part, I describe the experimental determination of thymidylate synthase X tridimensionnal structure, an interesting antibacterial target. Thymidylate synthase X is a flavoprotein discovered recently. It plays a key role in the synthesis of dTMP in most of the prokaryotic organisms, but does not exist in superior eukaryotic organisms. This protein catalyses the methyl transfer from tetrahydrofolate to dUMP using FAD as a cofactor and NADPH as substrate. The tridimensional structure of ThyX homotetramer with its cofactor, FAD, was solved at 2.4Å by molecular replacement. As shown in the Thermotoga maritima and Mycobacterium tuberculosis ThyX structures, the monomer contains a core of β sheets and two α helices at its extremity. The active site is at the interface between three monomers, the isoalloxazine part of FAD being accessible to the solvent and close to a long flexible loop. FAD binding in this structure is a little different from those already observed, especially its the adenine part. This structure, in association with directed mutagenesis experiments made by our collabora- tors, revealed residues playing a key role during the catalysis.La fonction d'une protéine est souvent subordonnée à l'interaction avec un certain nombre de partenaires. L'étude de la structure tridimensionnelle de ces complexes, qui ne peut souvent se faire expérimentalement, permettrait la compréhension de nombreux processus cellulaires. Le travail présenté ici se compose de deux parties. La première traite de la mise en place d'une fonction de score pour l'amarrage protéine-protéine et la deuxième de l'étude cristallographique d'une protéine tétramérique qui est une cible antibiotique potentielle : la thymidylate synthase X de Paramecium bursaria Chlorella virus. La modélisation des complexes protéine-protéine ou docking comporte deux étapes successives : d'abord, un grand nombre de conformations sont générées, puis une fonction de score est utilisée pour les classer. Cette fonction de score doit prendre en compte à la fois la complémentarité géométrique des deux molécules et les propriétés physico-chimiques des surfaces en interaction. Nous nous sommes intéressés à la seconde étape à travers le développement d'une fonction de score rapide et fiable. Ceci est possible grâce à la tessellation de Voronoï de la structure tridimensionnelle des protéines. En effet, les tessellations de Voronoï ou de Laguerre se sont avérées être de bons modèles mathématiques de la structure des protéines. En particulier, cette formalisation permet de faire une bonne description de l'empilement et des propriétés structurales des résidus. Cette modélisation rend compte l'empilement des résidus à l'interface entre deux protéines. Ainsi, il est possible de mesurer un ensemble de paramètres sur des complexes protéine-protéine dont la structure est connue expérimentalement et sur des complexes leurres générés artificiel- lement. Ces paramètres, sont la fréquence d'apparition des résidus ou des paires de résidus, les volumes des cellules de Voronoï, les distances entre les résidus en contact à l'interface, la surface de l'interface et le nombre de résidus à l'interface. Ils ont été utilisés en entrée de procédures d'apprentissage statistique. Grâce à ces procédures (apprentissage logistique, séparateurs à vaste marge (SVM) et algorithmes génétiques), on peut obtenir des fonctions de score efficaces, ca- pables de séparer les leurres des structures réelles. Dans un deuxième temps, j'ai déterminé expérimentalement la structure de la thymidylate synthase X, cible antibiotique de choix. La thymidylate synthase X est une flavoprotéine qui a été découverte récemment. Elle intervient dans la synthèse du dTMP chez la plupart des procaryotes mais n'existe pas chez les eucaryotes supérieurs. Cette protéine catalyse le transfert de methyle du tétrahydrofolate vers le dUMP grâce à son cofacteur le FAD et au NADPH qui intervient comme substrat. La structure tridimensionnelle de l'homotétramère de la thymidylate synthase X en présence de son cofacteur, le FAD, a été résolue à 2.4 Å par remplacement moléculaire. Comme pour les structures de thymidylate synthase X de Thermotoga maritima et de Mycobacterium tuberculosis précédemment résolues, le monomère se compose d'un coeur de feuillets β et de deux hélices α à son extrémité. Le site actif se trouve à l'interface de trois monomères, la partie isoalloxazine du FAD étant accessible au solvant et proche d'une longue boucle flexible. La fixation du FAD dans cette structure est légèrement différente de celles déjà observées par la conformation de la partie adénine. Cette structure, associée aux études de mutagénèse dirigée de nos collaborateurs, a permis de mettre évidence des résidus jouant un rôle majeur lors de la catalyse

    Utilisation de la tessellation de Voronoï pour l'étude des complexes protéine-protéine

    No full text
    Le travail présenté ici se compose de deux parties. La première traite de la mise en place d'une fonction de score pour l'amarrage protéine-protéine et la deuxième de l'étude cristallographique de la thymidylate synthase X de Paramecium bursaria Chlorella virus.L'amarrage protéine-protéine ou docking comporte deux étapes successives : d'abord, un grand nombre de conformations sont générées, puis une fonction de score est utilisée pour les classer. Cette fonction de score doit prendre en compte à la fois la complémentarité géométrique des deux molécules et les propriétés physico-chimiques des surfaces en interaction. Nous nous sommes intéressés à la seconde étape à travers le développement d'une fonction de score rapide et fiable. Ce développement utilise la tessellation de Voronoï de la structure tridimensionnelle des protéines. A partir de cette construction, il est possible de mesurer un ensemble de paramètres sur des complexes protéine-protéine dont la structure est connue expérimentalement et sur des complexes leurres générés artificiellement. Ces paramètres ont été utilisés en entrée de procédures d'apprentissage statistique. Ainsi, on peut obtenir des fonctions de score efficaces, capables de séparer les leurres des structures réelles.Dans un deuxième temps, j'ai déterminé expérimentalement la structure de la thymidylate synthase X, cible antibiotique potentielle. La structure tridimensionnelle de l'homotétramère en présence de son cofacteur, le FAD, a été résolue à 2.4 Å par remplacement moléculaire. Cette structure, associée aux études de mutagénèse dirigée de nos collaborateurs, a permis de mettre évidence des résidus jouant un rôle majeur lors de la catalyse.This work contains two parts. The first concerns the setting up of a scoring function for protein-protein docking and the second, the structure determination by X-ray crystallography of the thymidylate synthase X from Paramecium Bursaria Chlorella Virus.Protein-protein docking consists in two successive steps: first a large number of putative conformations are generated, and then a scoring function is applied to rank them. This scoring function has to take into account both geometric complementarity of the two molecules and physico-chemical properties of surfaces in interaction. We addressed the second step of this problem through the development of a fast and reliable scoring function. This was done using Voronoi tessellation of the three dimensional structure of the proteins. With this construction, it's possible to measure a set of parameters, on protein-protein complexes whose structure is known, and on decoys. Those parameters were used as input in statistical machine learning procedures. These led to efficient scoring functions, able to separate native structures from decoys.In the second part, I describe the experimental determination of thymidylate synthase X three dimensionnal structure, an interesting antibacterial target. The structure of ThyX homotetramer with its cofactor, FAD, was solved at 2.4Å by molecular replacement. This structure, in association with directed mutagenesis experiments made by our collaborators, revealed residues playing a key role during the catalysis.ORSAY-PARIS 11-BU Sciences (914712101) / SudocSudocFranceF

    ESBTL: efficient PDB parser and data structure for the structural and geometric analysis of biological macromolecules.

    No full text
    International audienceThe ever increasing number of structural biological data calls for robust and efficient software for analysis. Easy Structural Biology Template Library (ESBTL) is a lightweight C++ library that allows the handling of PDB data and provides a data structure suitable for geometric constructions and analyses. The parser and data model provided by this ready-to-use include-only library allows adequate treatment of usually discarded information (insertion code, atom occupancy, etc.) while still being able to detect badly formatted files. The template-based structure allows rapid design of new computational structural biology applications and is fully compatible with the new remediated PDB archive format. It also allows the code to be easy-to-use while being versatile enough to allow advanced user developments. AVAILABILITY: ESBTL is freely available under the GNU General Public License from http://esbtl.sf.net. The web site provides the source code, examples, code snippets and documentation
    • …
    corecore