21 research outputs found

    Sampling of conformational ensemble for virtual screening using molecular dynamics simulations and normal mode analysis

    Get PDF
    Aim: Molecular dynamics simulations and normal mode analysis are well-established approaches to generate receptor conformational ensembles (RCEs) for ligand docking and virtual screening. Here, we report new fast molecular dynamics-based and normal mode analysis-based protocols combined with conformational pocket classifications to efficiently generate RCEs. Materials \& methods: We assessed our protocols on two well-characterized protein targets showing local active site flexibility, dihydrofolate reductase and large collective movements, CDK2. The performance of the RCEs was validated by distinguishing known ligands of dihydrofolate reductase and CDK2 among a dataset of diverse chemical decoys. Results \& discussion: Our results show that different simulation protocols can be efficient for generation of RCEs depending on different kind of protein flexibility

    Algorithmic challenges for biomolecular simulations and protein design

    No full text
    Le dessin computationnel de protéine, ou CPD, est une technique qui permet de modifier les protéines pour leur conférer de nouvelles propriétés, en exploitant leurs structures 3D et une modélisation moléculaire. Pour rendre la méthode de plus en plus prédictive, les modèles employés doivent constamment progresser. Dans cette thèse, nous avons abordé le problème de la représentation explicite de la flexibilité du squelette protéique. Nous avons développé une méthode de dessin "multi-états", qui se base sur une bibliothèque discrète de conformations du squelette, établie à l'avance. Dans un contexte de simulation Monte Carlo, le paysage énergétique d'une protéine étant rugueux, les changements de squelettes ne peuvent etre acceptés que moyennant certaines précautions. Aussi, pour explorer ces conformations, en même temps que des mutations et des mouvements de chaînes latérales, nous avons introduit un nouveau type de déplacement dans une méthode Monte Carlo existante. Il s'agit d'un déplacement "hybride", où un changement de squelette est suivi d'une courte relaxation Monte Carlo des chaînes latérales seules, après laquelle un test d'acceptation est effectué. Pour respecter une distribution de Boltzmann des états, la probabilité doit avoir une forme précise, qui contient une intégrale de chemin, difficile à calculer en pratique. Deux approximations sont explorées en détail: une basée sur un seul chemin de relaxation, ou chemin "générateur" (Single Path Approximation, ou SPA), et une plus complexe basée sur un ensemble de chemins, obtenus en permutant les étapes élémentaires du chemin générateur (Permuted Path Approximation, ou PPA). Ces deux approximations sont étudiées et comparées sur deux protéines. En particulier, nous calculons les énergies relatives des conformations du squelette en utilisant trois méthodes différentes, qui passent réversiblement d'une conformation à l'autre en empruntent des chemins très différents. Le bon accord entre les méthodes, obtenu avec de nombreuses paramétrisations différentes, montre que l'énergie libre se comporte bien comme une fonction d'état, suggérant que les états sont bien échantillonnés selon la distribution de Boltzmann. La méthode d'échantillonnage est ensuite appliquée à une boucle dans le site actif de la tyrosyl-ARNt synthétase, permettant d'identifier des séquences qui favorisent une conformation, soit ouverte, soit fermée de la boucle, permettant en principe de contrôler ou redessiner sa conformation. Nous décrivons enfin un travail préliminaire visant à augmenter encore la flexibilité du squelette, en explorant un espace de conformations continu et non plus discret. Ce changement d'espace oblige à restructurer complètement le calcul des énergies et le déroulement des simulations, augmente considérable le coût des calculs, et nécessite une parallélisation beaucoup plus agressive du logiciel de simulation.Computational protein design is a method to modify proteins and obtain new properties, using their 3D structure and molecular modelling. To make the method more predictive, the models need continued improvement. In this thesis, we addressed the problem of explicitly representing the flexibility of the protein backbone. We developed a "multi-state" design approach, based on a small library of backbone conformations, defined ahead of time. In a Monte Carlo framework, given the rugged protein energy landscape, large backbone motions can only be accepted if precautions are taken. Thus, to explore these conformations, along with sidechain mutations and motions, we have introduced a new type of Monte Carlo move. The move is a "hybrid" one, where the backbone changes its conformation, then a short Monte Carlo relaxation of the sidechains is done, followed by an acceptation test. To obtain a Boltzmann sampling of states, the acceptation probability should have a specific form, which involves a path integral that is difficult to calculate. Two approximate forms are explored: the first is based on a single relaxation path, or "generating path" (Single Path Approximation or SPA). The second is more complex and relies on a collection of paths, obtained by shuffling the elementary steps of the generating path (Permuted Path Approximation or PPA). These approximations are tested in depth and compared on two proteins. Free energy differences between the backbone conformations are computed using three different approaches, which move the system reversibly from one conformation to another, but follow very different routes. Good agreement is obtained between the methods and a wide range of parameterizations, indicating that the free energy behaves as a state function, as it should, and strongly suggesting that Boltzmann sampling is verified. The sampling method is applied to the tyrosyl-tRNA synthetase enzyme, allowing us to identify sequences that prefer either an open or a closed conformation of an active site loop, so that in principle we can control, or design the loop conformation. Finally, we describe preliminary work to make the protein backbone fully flexible, moving within a continuous and not a discrete space. This new conformational space requires a complete reorganization of the energy calculation and Monte Carlo simulation scheme, increases simulation cost substantially, and requires a much more aggressive parallelization of our software

    Défis algorithmiques pour les simulations biomoléculaires et la conception de protéines

    No full text
    Computational protein design is a method to modify proteins and obtain new properties, using their 3D structure and molecular modelling. To make the method more predictive, the models need continued improvement. In this thesis, we addressed the problem of explicitly representing the flexibility of the protein backbone. We developed a "multi-state" design approach, based on a small library of backbone conformations, defined ahead of time. In a Monte Carlo framework, given the rugged protein energy landscape, large backbone motions can only be accepted if precautions are taken. Thus, to explore these conformations, along with sidechain mutations and motions, we have introduced a new type of Monte Carlo move. The move is a "hybrid" one, where the backbone changes its conformation, then a short Monte Carlo relaxation of the sidechains is done, followed by an acceptation test. To obtain a Boltzmann sampling of states, the acceptation probability should have a specific form, which involves a path integral that is difficult to calculate. Two approximate forms are explored: the first is based on a single relaxation path, or "generating path" (Single Path Approximation or SPA). The second is more complex and relies on a collection of paths, obtained by shuffling the elementary steps of the generating path (Permuted Path Approximation or PPA). These approximations are tested in depth and compared on two proteins. Free energy differences between the backbone conformations are computed using three different approaches, which move the system reversibly from one conformation to another, but follow very different routes. Good agreement is obtained between the methods and a wide range of parameterizations, indicating that the free energy behaves as a state function, as it should, and strongly suggesting that Boltzmann sampling is verified. The sampling method is applied to the tyrosyl-tRNA synthetase enzyme, allowing us to identify sequences that prefer either an open or a closed conformation of an active site loop, so that in principle we can control, or design the loop conformation. Finally, we describe preliminary work to make the protein backbone fully flexible, moving within a continuous and not a discrete space. This new conformational space requires a complete reorganization of the energy calculation and Monte Carlo simulation scheme, increases simulation cost substantially, and requires a much more aggressive parallelization of our software.Le dessin computationnel de protéine, ou CPD, est une technique qui permet de modifier les protéines pour leur conférer de nouvelles propriétés, en exploitant leurs structures 3D et une modélisation moléculaire. Pour rendre la méthode de plus en plus prédictive, les modèles employés doivent constamment progresser. Dans cette thèse, nous avons abordé le problème de la représentation explicite de la flexibilité du squelette protéique. Nous avons développé une méthode de dessin "multi-états", qui se base sur une bibliothèque discrète de conformations du squelette, établie à l'avance. Dans un contexte de simulation Monte Carlo, le paysage énergétique d'une protéine étant rugueux, les changements de squelettes ne peuvent etre acceptés que moyennant certaines précautions. Aussi, pour explorer ces conformations, en même temps que des mutations et des mouvements de chaînes latérales, nous avons introduit un nouveau type de déplacement dans une méthode Monte Carlo existante. Il s'agit d'un déplacement "hybride", où un changement de squelette est suivi d'une courte relaxation Monte Carlo des chaînes latérales seules, après laquelle un test d'acceptation est effectué. Pour respecter une distribution de Boltzmann des états, la probabilité doit avoir une forme précise, qui contient une intégrale de chemin, difficile à calculer en pratique. Deux approximations sont explorées en détail: une basée sur un seul chemin de relaxation, ou chemin "générateur" (Single Path Approximation, ou SPA), et une plus complexe basée sur un ensemble de chemins, obtenus en permutant les étapes élémentaires du chemin générateur (Permuted Path Approximation, ou PPA). Ces deux approximations sont étudiées et comparées sur deux protéines. En particulier, nous calculons les énergies relatives des conformations du squelette en utilisant trois méthodes différentes, qui passent réversiblement d'une conformation à l'autre en empruntent des chemins très différents. Le bon accord entre les méthodes, obtenu avec de nombreuses paramétrisations différentes, montre que l'énergie libre se comporte bien comme une fonction d'état, suggérant que les états sont bien échantillonnés selon la distribution de Boltzmann. La méthode d'échantillonnage est ensuite appliquée à une boucle dans le site actif de la tyrosyl-ARNt synthétase, permettant d'identifier des séquences qui favorisent une conformation, soit ouverte, soit fermée de la boucle, permettant en principe de contrôler ou redessiner sa conformation. Nous décrivons enfin un travail préliminaire visant à augmenter encore la flexibilité du squelette, en explorant un espace de conformations continu et non plus discret. Ce changement d'espace oblige à restructurer complètement le calcul des énergies et le déroulement des simulations, augmente considérable le coût des calculs, et nécessite une parallélisation beaucoup plus agressive du logiciel de simulation

    A Hybrid Monte Carlo Scheme for Multibackbone Protein Design

    No full text
    International audienceMultistate protein design explores side chain mutations, with the backbone allowed to sample a small, predetermined library of conformations. To achieve Boltzmann sampling of sequences and conformations, we use a hybrid Monte Carlo (MC) scheme: a trial hop between backbone models is followed by a short MC segment where side chain rotamers adjust to the new backbone, before applying a Metropolis-like acceptance test. The theoretical form and a practical approximation for the acceptance test are derived. We then compute backbone conformational free energies for two SH2 and SH3 proteins using different routes and protocols, and verify that for simple test problems, the free energy behaves like a state function, a hallmark of Boltzmann sampling

    Probing the stereospecificity of tyrosyl- and glutaminyl-tRNA synthetase with molecular dynamics

    No full text
    International audienceThe stereospecificity of aminoacyl-tRNA synthetases helps exclude d-amino acids from protein synthesis and could perhaps be engineered to allow controlled d-amino acylation of tRNA. We use molecular dynamics simulations to probe the stereospecificity of the class I tyrosyl- and glutaminyl-tRNA synthetases (TyrRS, GlnRS), including wildtype enzymes and three point mutants suggested by three different protein design methods. l/d binding free energy differences are obtained by alchemically and reversibly transforming the ligand from L to D in simulations of the protein-ligand complex. The D81Q mutation in Escherichia coli TyrRS is homologous to the D81R mutant shown earlier to have inverted stereospecificity. D81Q is predicted to lead to a rotated ligand backbone and an increased, not a decreased l-Tyr preference. The E36Q mutation in Methanococcus jannaschii TyrRS has a predicted l/d binding free energy difference ΔΔG of just 0.5±0.9kcal/mol, compared to 3.1±0.8kcal/mol for the wildtype enzyme (favoring l-Tyr). The ligand ammonium position is preserved in the d-Tyr complex, while the carboxylate is shifted. Wildtype GlnRS has a similar preference for l-glutaminyl adenylate; the R260Q mutant has an increased preference, even though Arg260 makes a large contribution to the wildtype ΔΔG value

    A multiparameter optimization in middle‐down analysis of monoclonal antibodies by LC–MS/MS

    No full text
    International audienceIn antibody-based drug research, a complete characterization of antibody proteoforms covering both the amino acid sequence and all posttranslational modifications remains a major concern. The usual mass spectrometry-based approach to achieve this goal is bottom-up proteomics, which relies on the digestion of antibodies but does not allow the diversity of proteoforms to be assessed. Middle-down and topdown approaches have recently emerged as attractive alternatives but are not yet mastered and thus used in routine by many analytical chemistry laboratories. The work described here aims at providing guidelines to achieve the best sequence coverage for the fragmentation of intact light and heavy chains generated from a simple reduction of intact antibodies using Orbitrap mass spectrometry. Three parameters were found crucial to this aim: the use of an electron-based activation technique, the multiplex selection of precursor ions of different charge states, and the combination of replicates

    Phylogenetic analysis of Harmonin homology domains

    No full text
    International audienceBackground: Harmonin Homogy Domains (HHD) are recently identified orphan domains of about 70 residues folded in a compact five alpha-helix bundle that proved to be versatile in terms of function, allowing for direct binding to a partner as well as regulating the affinity and specificity of adjacent domains for their own targets. Adding their small size and rather simple fold, HHDs appear as convenient modules to regulate protein-protein interactions in various biological contexts. Surprisingly, only nine HHDs have been detected in six proteins, mainly expressed in sensory neurons.Results: Here, we built a profile Hidden Markov Model to screen the entire UniProtKB for new HHD-containing proteins. Every hit was manually annotated, using a clustering approach, confirming that only a few proteins contain HHDs. We report the phylogenetic coverage of each protein and build a phylogenetic tree to trace the evolution of HHDs. We suggest that a HHD ancestor is shared with Paired Amphipathic Helices (PAH) domains, a four-helix bundle partially sharing fold and functional properties. We characterized amino-acid sequences of the various HHDs using pairwise BLASTP scoring coupled with community clustering and manually assessed sequence features among each individual family. These sequence features were analyzed using reported structures as well as homology models to highlight structural motifs underlying HHDs fold. We show that functional divergence is carried out by subtle differences in sequences that automatized approaches failed to detect.Conclusions: We provide the first HHD databases, including sequences and conservation, phylogenic trees and a list of HHD variants found in the auditory system, which are available for the community. This case study highlights surprising phylogenetic properties found in orphan domains and will assist further studies of HHDs. We unveil the implication of HHDs in their various binding interfaces using conservation across families and a new protein-protein surface predictor. Finally, we discussed the functional consequences of three identified pathogenic HHD variants involved in Hoyeraal-Hreidarsson syndrome and of three newly reported pathogenic variants identified in patients suffering from Usher Syndrome

    InDeep: 3D fully convolutional neural networks to assist <i>in silico</i> drug design on protein–protein interactions

    No full text
    International audienceMotivation: Protein-protein interactions (PPIs) are key elements in numerous biological pathways and the subject of a growing number of drug discovery projects including against infectious diseases. Designing drugs on PPI targets remains a difficult task and requires extensive efforts to qualify a given interaction as an eligible target. To this end, besides the evident need to determine the role of PPIs in disease-associated pathways and their experimental characterization as therapeutics targets, prediction of their capacity to be bound by other protein partners or modulated by future drugs is of primary importance. Results: We present InDeep, a tool for predicting functional binding sites within proteins that could either host protein epitopes or future drugs. Leveraging deep learning on a curated dataset of PPIs, this tool can proceed to enhanced functional binding site predictions either on experimental structures or along molecular dynamics trajectories. The benchmark of InDeep demonstrates that our tool outperforms state-of-the-art ligandable binding sites predictors when assessing PPI targets but also conventional targets. This offers new opportunities to assist drug design projects on PPIs by identifying pertinent binding pockets at or in the vicinity of PPI interfaces

    MinOmics, an Integrative and Immersive Tool for Multi-Omics Analysis

    No full text
    International audienceProteomic and transcriptomic technologies resulted in massive biological datasets, their interpretation requiring sophisticated computational strategies. Efficient and intuitive real-time analysis remains challenging. We use proteomic data on 1417 proteins of the green microalga Chlamydomonas reinhardtii to investigate physicochemical parameters governing selectivity of three cysteine-based redox post translational modifications (PTM): glutathionylation (SSG), nitrosylation (SNO) and disulphide bonds (SS) reduced by thioredoxins. We aim to understand underlying molecular mechanisms and structural determinants through integration of redox proteome data from gene- to structural level. Our interactive visual analytics approach on an 8.3 m2 display wall of 25 MPixel resolution features stereoscopic three dimensions (3D) representation performed by UnityMol WebGL. Virtual reality headsets complement the range of usage configurations for fully immersive tasks. Our experiments confirm that fast access to a rich cross-linked database is necessary for immersive analysis of structural data. We emphasize the possibility to display complex data structures and relationships in 3D, intrinsic to molecular structure visualization, but less common for omics-network analysis. Our setup is powered by MinOmics, an integrated analysis pipeline and visualization framework dedicated to multi-omics analysis. MinOmics integrates data from various sources into a materialized physical repository. We evaluate its performance, a design criterion for the framework

    MinOmics, an Integrative and Immersive Tool for Multi-Omics Analysis

    No full text
    Proteomic and transcriptomic technologies resulted in massive biological datasets, their interpretation requiring sophisticated computational strategies. Efficient and intuitive real-time analysis remains challenging. We use proteomic data on 1417 proteins of the green microalga Chlamydomonas reinhardtii to investigate physicochemical parameters governing selectivity of three cysteine-based redox post translational modifications (PTM): glutathionylation (SSG), nitrosylation (SNO) and disulphide bonds (SS) reduced by thioredoxins. We aim to understand underlying molecular mechanisms and structural determinants through integration of redox proteome data from gene- to structural level. Our interactive visual analytics approach on an 8.3 m2 display wall of 25 MPixel resolution features stereoscopic three dimensions (3D) representation performed by UnityMol WebGL. Virtual reality headsets complement the range of usage configurations for fully immersive tasks. Our experiments confirm that fast access to a rich cross-linked database is necessary for immersive analysis of structural data. We emphasize the possibility to display complex data structures and relationships in 3D, intrinsic to molecular structure visualization, but less common for omics-network analysis. Our setup is powered by MinOmics, an integrated analysis pipeline and visualization framework dedicated to multi-omics analysis. MinOmics integrates data from various sources into a materialized physical repository. We evaluate its performance, a design criterion for the framework
    corecore