12 research outputs found

    Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study

    Get PDF
    Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy

    Development of an in silico method to characterize the interaction potential of protein surfaces in a crowded environment

    No full text
    Dans la cellule, les protĂ©ines Ă©voluent dans un environnement trĂšs dense et interagissent ainsi avec un grand nombre de partenaires spĂ©cifiques et non-spĂ©cifiques qui entrent en compĂ©tition. L’objectif de ma thĂšse est de caractĂ©riser les propriĂ©tĂ©s physiques et Ă©volutives des surfaces protĂ©iques pour comprendre comment la pression de sĂ©lection s’exerce sur les protĂ©ines, façonnant leurs interactions et rĂ©gulant ainsi cette sĂ©vĂšre compĂ©tition.Pour cela, j’ai dĂ©veloppĂ© une mĂ©thodologie permettant de caractĂ©riser la propension des protĂ©ines Ă  interagir avec les protĂ©ines de leur environnement, par des approches de docking. La cartographie molĂ©culaire permettant la visualisation et la comparaison des propriĂ©tĂ©s de la surface des protĂ©ines, j’ai donc mis en place un nouveau cadre thĂ©orique basĂ© sur une reprĂ©sentation des paysages Ă©nergĂ©tiques d'interaction par des cartes d'Ă©nergies. Ces cartes (en deux dimensions) reflĂštent de maniĂšre synthĂ©tique la propension des surfaces protĂ©iques Ă  engager des interactions avec d’autres protĂ©ines. Elles sont donc d’un grand intĂ©rĂȘt pratique pour dĂ©terminer les rĂ©gions des surfaces protĂ©iques les plus enclines Ă  engager des interactions avec d’autres molĂ©cules.Ce nouveau cadre thĂ©orique a permis de montrer que les surfaces des protĂ©ines comprennent des rĂ©gions de diffĂ©rents niveaux d'Ă©nergies de liaison (rĂ©gions chaudes, intermĂ©diaires et froides pour les rĂ©gions d'interaction favorables, intermĂ©diaires et dĂ©favorables respectivement).Une partie importante de la thĂšse a consistĂ© Ă  caractĂ©riser les propriĂ©tĂ©s physico-chimiques et Ă©volutives de ces diffĂ©rentes rĂ©gions. L'autre partie a consistĂ© Ă  appliquer cette mĂ©thode sur plusieurs systĂšmes : complexes homomĂ©riques, protĂ©ines du cytosol de S. cerevisiae, familles d'interologues. Ce travail ouvre la voie Ă  un grand nombre d'applications en bioinformatique structurale, telles que la prĂ©diction de sites de liaison, l’annotation fonctionnelle ou encore le design de nouvelles interactions.En conclusion, la stratĂ©gie mise en place lors de ma thĂšse permet d’explorer la propension d’une protĂ©ine Ă  interagir avec des centaines de partenaires d'intĂ©rĂȘts, et donc d'investiguer le comportement d’une protĂ©ine dans un environnement cellulaire spĂ©cifique. Cela va donc au-delĂ  de l'utilisation classique du docking "binaire" puisque notre stratĂ©gie fournit une vision systĂ©mique des interactions protĂ©iques Ă  l’échelle des "rĂ©sidus".In the crowded cell, proteins interact with their functional partners, but also with a large number of non-functional partners that compete with the functional ones. The goal of this thesis is to characterize the physical properties and the evolution of protein surfaces in order to understand how selection pressure exerts on proteins, shaping their interactions and regulating this severe competition.To do this I developed a framework based on docking calculations to characterize the propensity of protein surfaces to interact with other proteins. Molecular cartography enables the visualization and the comparison of surface properties of proteins. I implemented a new theoretical framework based on the representation of interaction energy landscapes by 2-D energy maps. These maps reflect in a synthetic manner the propensity of the surface of proteins to interact with other proteins. These maps are useful from a practical point view for determining the regions of protein’s surface that are more prone to interact with other proteins. Our new theoretical framework enabled to show that the surface of proteins harbor regions with different levels of propensity to interact with other proteins (hot regions, intermediate and cold regions to favorable, intermediate and unfavorable regions respectively).A large part of this thesis work consisted in characterizing the physico-chemical properties and the evolution of these regions. The other part of this thesis work consisted in applying this methodology on several study systems: homomeric complexes, cytosolic proteins from S. cerevisiae, families of interologs. This work opens the way to numerous practical applications in structural bioinformatics, such as binding site prediction, functional annotation and the design of new interactions.To conclude, the strategy implemented in this work enable the exploration of the propensity of a protein to interact with hundred of protein partners. It thus enables the investigation of the behavior of a protein in a crowded environment. This application goes beyond the classical use of protein docking as a, because our strategy provides a systemic point of view of protein interactions at an atomic resolution

    DĂ©veloppement d’une mĂ©thode in silico pour caractĂ©riser le potentiel d’interaction des surfaces protĂ©iques dans un environnement encombrĂ©

    No full text
    In the crowded cell, proteins interact with their functional partners, but also with a large number of non-functional partners that compete with the functional ones. The goal of this thesis is to characterize the physical properties and the evolution of protein surfaces in order to understand how selection pressure exerts on proteins, shaping their interactions and regulating this severe competition.To do this I developed a framework based on docking calculations to characterize the propensity of protein surfaces to interact with other proteins. Molecular cartography enables the visualization and the comparison of surface properties of proteins. I implemented a new theoretical framework based on the representation of interaction energy landscapes by 2-D energy maps. These maps reflect in a synthetic manner the propensity of the surface of proteins to interact with other proteins. These maps are useful from a practical point view for determining the regions of protein’s surface that are more prone to interact with other proteins. Our new theoretical framework enabled to show that the surface of proteins harbor regions with different levels of propensity to interact with other proteins (hot regions, intermediate and cold regions to favorable, intermediate and unfavorable regions respectively).A large part of this thesis work consisted in characterizing the physico-chemical properties and the evolution of these regions. The other part of this thesis work consisted in applying this methodology on several study systems: homomeric complexes, cytosolic proteins from S. cerevisiae, families of interologs. This work opens the way to numerous practical applications in structural bioinformatics, such as binding site prediction, functional annotation and the design of new interactions.To conclude, the strategy implemented in this work enable the exploration of the propensity of a protein to interact with hundred of protein partners. It thus enables the investigation of the behavior of a protein in a crowded environment. This application goes beyond the classical use of protein docking as a, because our strategy provides a systemic point of view of protein interactions at an atomic resolution.Dans la cellule, les protĂ©ines Ă©voluent dans un environnement trĂšs dense et interagissent ainsi avec un grand nombre de partenaires spĂ©cifiques et non-spĂ©cifiques qui entrent en compĂ©tition. L’objectif de ma thĂšse est de caractĂ©riser les propriĂ©tĂ©s physiques et Ă©volutives des surfaces protĂ©iques pour comprendre comment la pression de sĂ©lection s’exerce sur les protĂ©ines, façonnant leurs interactions et rĂ©gulant ainsi cette sĂ©vĂšre compĂ©tition.Pour cela, j’ai dĂ©veloppĂ© une mĂ©thodologie permettant de caractĂ©riser la propension des protĂ©ines Ă  interagir avec les protĂ©ines de leur environnement, par des approches de docking. La cartographie molĂ©culaire permettant la visualisation et la comparaison des propriĂ©tĂ©s de la surface des protĂ©ines, j’ai donc mis en place un nouveau cadre thĂ©orique basĂ© sur une reprĂ©sentation des paysages Ă©nergĂ©tiques d'interaction par des cartes d'Ă©nergies. Ces cartes (en deux dimensions) reflĂštent de maniĂšre synthĂ©tique la propension des surfaces protĂ©iques Ă  engager des interactions avec d’autres protĂ©ines. Elles sont donc d’un grand intĂ©rĂȘt pratique pour dĂ©terminer les rĂ©gions des surfaces protĂ©iques les plus enclines Ă  engager des interactions avec d’autres molĂ©cules.Ce nouveau cadre thĂ©orique a permis de montrer que les surfaces des protĂ©ines comprennent des rĂ©gions de diffĂ©rents niveaux d'Ă©nergies de liaison (rĂ©gions chaudes, intermĂ©diaires et froides pour les rĂ©gions d'interaction favorables, intermĂ©diaires et dĂ©favorables respectivement).Une partie importante de la thĂšse a consistĂ© Ă  caractĂ©riser les propriĂ©tĂ©s physico-chimiques et Ă©volutives de ces diffĂ©rentes rĂ©gions. L'autre partie a consistĂ© Ă  appliquer cette mĂ©thode sur plusieurs systĂšmes : complexes homomĂ©riques, protĂ©ines du cytosol de S. cerevisiae, familles d'interologues. Ce travail ouvre la voie Ă  un grand nombre d'applications en bioinformatique structurale, telles que la prĂ©diction de sites de liaison, l’annotation fonctionnelle ou encore le design de nouvelles interactions.En conclusion, la stratĂ©gie mise en place lors de ma thĂšse permet d’explorer la propension d’une protĂ©ine Ă  interagir avec des centaines de partenaires d'intĂ©rĂȘts, et donc d'investiguer le comportement d’une protĂ©ine dans un environnement cellulaire spĂ©cifique. Cela va donc au-delĂ  de l'utilisation classique du docking "binaire" puisque notre stratĂ©gie fournit une vision systĂ©mique des interactions protĂ©iques Ă  l’échelle des "rĂ©sidus"

    SURFMAP: A Software for Mapping in Two Dimensions Protein Surface Features

    No full text
    International audienceMolecular cartography using two-dimensional (2D) representation of protein surfaces has been shown to be very promising for protein surface analysis. Here, we present SURFMAP, a free standalone and easy-to-use software that enables the fast and automated 2D projection of either predefined features of protein surface (i.e., electrostatic potential, hydrophobicity, stickiness, and surface relief) or any descriptor encoded in the temperature factor column of a PDB file. SURFMAP proposes three different “equal-area” projections that have the advantage of preserving the area measures. It provides the user with (i) 2D maps that enable the easy and visual analysis of protein surface features of interest and (ii) maps in a text file format allowing the fast and straightforward quantitative comparison of 2D maps of homologous proteins

    Protein interaction energy landscapes are shaped by functional and also non-functional partners

    No full text
    In the crowded cell, a strong selective pressure operates on the proteome to limit the competition between functional and non-functional protein-protein interactions. We developed an original theoretical framework in order to interrogate how this competition constrains the behavior of proteins with respect to their partners or random encounters. Our theoretical framework relies on a two-dimensional (2D) representation of interaction energy landscapes, with 2D energy maps, which reflect in a synthetic way the spatial distribution of the interaction propensity of a protein surface for another protein. We realized the interaction propensity mapping of proteins surfaces in interaction with functional and arbitrary partners and asked whether the distribution of their interaction propensity is conserved during evolution. Therefore, we performed several thousands of cross-docking simulations to systematically characterize the energy landscapes of 103 proteins interacting with different sets of homologs, corresponding to their functional partner's family or arbitrary protein families. Then, we systematically compared the energy maps resulting from the docking of each protein with the different protein families of the dataset. Strikingly, we show that the interaction propensity not only of the binding sites but also of the rest of the surface is conserved for docking partners belonging to the same protein family. Interestingly, this observation holds for docked proteins corresponding to true but also arbitrary partners. Our theoretical framework enables the characterization of the energy behavior of a protein in interaction with hundreds of proteins and opens the way for the characterization of the behavior of proteins in a specific environment

    CC+: A Searchable Database of Validated Coiled Coils in PDB Structures and AlphaFold2 Models

    No full text
    α‐Helical coiled coils are common tertiary and quaternary elements of protein structure. In coiled coils, two or more α helices wrap around each other to form bundles. This apparently simple structural motif can generate many architectures and topologies. Coiled coil‐forming sequences can be predicted from heptad repeats of hydrophobic and polar residues, hpphppp , although this is not always reliable. Alternatively, coiled‐coil structures can be identified using the program SOCKET, which finds knobs‐into‐holes (KIH) packing between side chains of neighboring helices. SOCKET also classifies coiled‐coil architecture and topology, thus allowing sequence‐to‐structure relationships to be garnered. In 2009, we used SOCKET to create a relational database of coiled‐coil structures, CC + , from the RCSB Protein Data Bank (PDB). Here, we report an update of CC + following an update of SOCKET (to Socket2) and the recent explosion of structural data and the success of AlphaFold2 in predicting protein structures from genome sequences. With the most‐stringent SOCKET parameters, CC + contains ≈12,000 coiled‐coil assemblies from experimentally determined structures, and ≈120,000 potential coiled‐coil structures within single‐chain models predicted by AlphaFold2 across 48 proteomes. CC + allows these and other less‐stringently defined coiled coils to be searched at various levels of structure, sequence, and side‐chain interactions. The identified coiled coils can be viewed directly from CC + using the Socket2 application, and their associated data can be downloaded for further analyses. CC + is available freely at http://coiledcoils.chm.bris.ac.uk/CCPlus/Home.html . It will be updated automatically. We envisage that CC+ could be used to understand coiled‐coil assemblies and their sequence‐to‐structure relationships, and to aid protein design and engineering.</p

    An atlas of protein homo-oligomerization across domains of life

    No full text
    Protein structures are essential to understanding cellular processes in molecular detail. While advances in artificial intelligence revealed the tertiary structure of proteins at scale, their quaternary structure remains mostly unknown. We devise a scalable strategy based on AlphaFold2 to predict homo-oligomeric assemblies across four proteomes spanning the tree of life. Our results suggest that approximately 45% of an archaeal proteome and a bacterial proteome and 20% of two eukaryotic proteomes form homomers. Our predictions accurately capture protein homo-oligomerization, recapitulate megadalton complexes, and unveil hundreds of homo-oligomer types, including three confirmed experimentally by structure determination. Integrating these datasets with omics information suggests that a majority of known protein complexes are symmetric. Finally, these datasets provide a structural context for interpreting disease mutations and reveal coiled-coil regions as major enablers of quaternary structure evolution in human. Our strategy is applicable to any organism and provides a comprehensive view of homo-oligomerization in proteomes

    Meet-U: Educating through research immersion.

    No full text
    We present a new educational initiative called Meet-U that aims to train students for collaborative work in computational biology and to bridge the gap between education and research. Meet-U mimics the setup of collaborative research projects and takes advantage of the most popular tools for collaborative work and of cloud computing. Students are grouped in teams of 4-5 people and have to realize a project from A to Z that answers a challenging question in biology. Meet-U promotes "coopetition," as the students collaborate within and across the teams and are also in competition with each other to develop the best final product. Meet-U fosters interactions between different actors of education and research through the organization of a meeting day, open to everyone, where the students present their work to a jury of researchers and jury members give research seminars. This very unique combination of education and research is strongly motivating for the students and provides a formidable opportunity for a scientific community to unite and increase its visibility. We report on our experience with Meet-U in two French universities with master's students in bioinformatics and modeling, with protein-protein docking as the subject of the course. Meet-U is easy to implement and can be straightforwardly transferred to other fields and/or universities. All the information and data are available at www.meet-u.org

    Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study

    Get PDF
    Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy
    corecore