10 research outputs found

    Robust structure-based resonance assignment for functional protein studies by NMR

    Get PDF
    High-throughput functional protein NMR studies, like protein interactions or dynamics, require an automated approach for the assignment of the protein backbone. With the availability of a growing number of protein 3D structures, a new class of automated approaches, called structure-based assignment, has been developed quite recently. Structure-based approaches use primarily NMR input data that are not based on J-coupling and for which connections between residues are not limited by through bonds magnetization transfer efficiency. We present here a robust structure-based assignment approach using mainly HN–HN NOEs networks, as well as 1H–15N residual dipolar couplings and chemical shifts. The NOEnet complete search algorithm is robust against assignment errors, even for sparse input data. Instead of a unique and partly erroneous assignment solution, an optimal assignment ensemble with an accuracy equal or near to 100% is given by NOEnet. We show that even low precision assignment ensembles give enough information for functional studies, like modeling of protein-complexes. Finally, the combination of NOEnet with a low number of ambiguous J-coupling sequential connectivities yields a high precision assignment ensemble. NOEnet will be available under: http://www.icsn.cnrs-gif.fr/download/nmr

    NOEnet–Use of NOE networks for NMR resonance assignment of proteins with known 3D structure

    Get PDF
    Motivation: A prerequisite for any protein study by NMR is the assignment of the resonances from the 15N−1H HSQC spectrum to their corresponding atoms of the protein backbone. Usually, this assignment is obtained by analyzing triple resonance NMR experiments. An alternative assignment strategy exploits the information given by an already available 3D structure of the same or a homologous protein. Up to now, the algorithms that have been developed around the structure-based assignment strategy have the important drawbacks that they cannot guarantee a high assignment accuracy near to 100%

    Developing a scoring function for NMR structure-based assignments using machine learning

    Get PDF
    Determining the assignment of signals received from the ex- periments (peaks) to speci_c nuclei of the target molecule in Nuclear Magnetic Resonance (NMR1) spectroscopy is an important challenge. Nuclear Vector Replacement (NVR) ([2, 3]) is a framework for structure- based assignments which combines multiple types of NMR data such as chemical shifts, residual dipolar couplings, and NOEs. NVR-BIP [1] is a tool which utilizes a scoring function with a binary integer programming (BIP) model to perform the assignments. In this paper, support vector machines (SVM) and boosting are employed to combine the terms in NVR-BIP's scoring function by viewing the assignment as a classi_ca- tion problem. The assignment accuracies obtained using this approach show that boosting improves the assignment accuracy of NVR-BIP on our data set when RDCs are not available and outperforms SVMs. With RDCs, boosting and SVMs o_er mixed results

    Developing a Scoring Function for NMR Structure-based Assignments using Machine Learning

    Get PDF
    Abstract. Determining the assignment of signals received from the experiments (peaks) to specific nuclei of the target molecule in Nuclear Magnetic Resonance (NMR 1 ) spectroscopy is an important challenge. Nuclear Vector Replacement (NVR) ([2, 3]) is a framework for structurebased assignments which combines multiple types of NMR data such as chemical shifts, residual dipolar couplings, and NOEs. NVR-BIP [1] is a tool which utilizes a scoring function with a binary integer programming (BIP) model to perform the assignments. In this paper, support vector machines (SVM) and boosting are employed to combine the terms in NVR-BIP's scoring function by viewing the assignment as a classification problem. The assignment accuracies obtained using this approach show that boosting improves the assignment accuracy of NVR-BIP on our data set when RDCs are not available and outperforms SVMs. With RDCs, boosting and SVMs offer mixed results

    Fast and Robust Mathematical Modeling of NMR Assignment Problems

    Get PDF
    NMR spectroscopy is not only for protein structure determination, but also for drug screening and studies of dynamics and interactions. In both cases, one of the main bottleneck steps is backbone assignment. When a homologous structure is available, it can accelerate assignment. Such structure-based methods are the focus of this thesis. This thesis aims for fast and robust methods for NMR assignment problems; in particular, structure-based backbone assignment and chemical shift mapping. For speed, we identified situations where the number of 15N-labeled experiments for structure-based assignment can be reduced; in particular, when a homologous assignment or chemical shift mapping information is available. For robustness, we modeled and directly addressed the errors. Binary integer linear programming, a well-studied method in operations research, was used to model the problems and provide practically efficient solutions with optimality guarantees. Our approach improved on the most robust method for structure-based backbone assignment on 15N-labeled data by improving the accuracy by 10% on average on 9 proteins, and then by handling typing errors, which had previously been ignored. We show that such errors can have a large impact on the accuracy; decreasing the accuracy from 95% or greater to between 40% and 75%. On automatically picked peaks, which is much noisier than manually picked peaks, we achieved an accuracy of 97% on ubiquitin. In chemical shift mapping, the peak tracking is often done manually because the problem is inherently visual. We developed a computer vision approach for tracking the peak movements with average accuracy of over 95% on three proteins with less than 1.5 residues predicted per peak. One of the proteins tested is larger than any tested by existing automated methods, and it has more titration peak lists. We then combined peak tracking with backbone assignment to take into account contact information, which resulted in an average accuracy of 94% on one-to-one assignments for these three proteins. Finally, we applied peak tracking and backbone assignment to protein-ligand docking to illustrate the potential for fast 3D complex determination

    A Tabu search approach for the nuclear magnetic resonance protein structure based assignment problem

    Get PDF
    Nuclear Magnetic Resonance (NMR) Spectroscopy is an experimental technique which exploits the magnetic properties of specific nuclei and enables the study of proteins in solution. The key bottleneck of NMR studies is to map the NMR peaks to corresponding nuclei, also known as the assignment problem. Structure Based Assignment (SBA) is an approach to solve this computationally challenging problem by using prior information about the protein obtained from a homologous structure. [17] used the Nuclear Vector Replacement (NVR) [29] framework to model SBA as a binary integer programming problem (NVR-BIP). In this thesis, we prove that this problem is NP-hard and propose a tabu search algorithm (NVR-TS) equipped with a guided perturbation mechanism to efficiently solve it. NVR-TS uses a quadratic penalty relaxation of NVR-BIP where the violations in the Nuclear Overhauser Effect constraints are penalized in the objective function. Experimental results indicate that our algorithm finds the optimal solution on NVR-BIP's data set which consists of 7 proteins with 25 templates (31 to 126 residues). Furthermore, for two additional large proteins, MBP and EIN (348 and 243 residues, respectively) which NVR-BIP failed to solve, it achieves 91% and 41% assignment accuracies. The executable and the input files are available for download at http://people.sabanciuniv.edu/catay/NVR-TS/NVR-TS.html

    Développement de stratégies de marquage isotopique des groupements méthyles pour l'étude d'assemblages protéiques de grande taille par RMN

    No full text
    Solution NMR spectroscopy has been limited to small biological objects for a long time. Nowadays, it is unequivocally recognized that the strategy of specific isotope labeling of methyl groups in a perdeuterated protein has significantly extended the frontier of this technique. Indeed, proteins as large as 1 MDa could be investigated by NMR. Conversely, this strategy presents an important drawback consisting of the drastically reduced number of protonated probes. The project of this thesis falls within the framework of developing new methodologies to cope with this scarce structural information, relying on the simultaneous labeling of several methyl groups to increase the number of probes. For optimized combinatorial labeling, the choice of the ensemble of amino acids to label simultaneously and the precursors as well as the protocol for their incorporation have to be carefully studied. In this work, a new protocol was introduced for the scrambling-free and optimized isotopic labeling of AbId1(LV)proS methyl groups. In comparison to the “standard AbId1LV” labeling scheme, the proposed pattern induces a 2-fold decrease of number of Leu and Val NMR signals and enhances the intensity of detectable long-range nOes by a factor 4. The described protocol also permits the suppression of spurious correlations, especially harmful for structural studies based on detection/analysis of nOes. To make an efficient use of the obtained high quality NMR spectra using this protocol, assignment of the methyl groups signals is mandatory. Two strategies were then proposed. The first is suitable for systems whose molecular weight does not exceed 100 kDa. It relies on the use of isotopically linearized precursors (with different isotope topologies to discriminate each methyl group) to assign in a regio- and stereo-specific manner the isoleucine, leucine and valine methyl groups in a single step, employing an optimized “out and back” 13C-TOCSY pulse sequence. While the second, adapted to supra-molecular proteins (> 100 kDa), consists of optimizing the previously reported SeSAM approach (Sequence-Specific Assignment of Methyl groups by Mutagenesis). Indeed, thanks to the developed enriched culture medium for the specific labeling of Ala, the minimal required culture volume was significantly decreased, enabling the proteins expression in 24 well plates and their parallel purification in 96 well plates. This improved SeSAM version was estimated to allow the assignment of ca. 100 methyl cross-peaks in 2 weeks, including 4 days of NMR time and less than 2 k€ of isotopic materials. To illustrate the pertinence of using selectively protonated methyl groups, either in a single or combined fashion, several applications were presented, namely the real-time NMR study of self-assembly process of a ~0.5 MDa supra-molecular protein (PhTET-2). The use of combinatorial labeling for the detection of long-range nOes to up to 10 Å (8 Å) in proteins of 82 kDa (respectively 0.5 MDa) was also investigated. This same approach was exploited for the filtering of inter-monomeric long-range nOes in the same symmetrical and homo-oligomeric PhTET-2 protein.Durant longtemps la spectroscopie RMN en solution a été limitée à de petits objets biologiques. Aujourd'hui, il est clairement reconnu que la stratégie du marquage isotopique spécifique de groupements méthyle dans une protéine perdeutérée a considérablement repoussé la frontière de cette technique. En effet, des protéines aussi grande que 1 MDa ont pu être récemment étudiées par RMN. Cependant, cette stratégie présente un inconvénient important lié au nombre réduit de sondes protonées. Dans ce contexte, le projet de thèse vise a développer de nouvelles méthodes pour faire face à cette rareté d'information structurale, en s'appuyant sur le marquage simultané (combinatoire) de plusieurs groupements méthyle afin d'augmenter le nombre de sondes. Pour un marquage combinatoire optimal, le choix des acides aminés à marquer et les précurseurs à utiliser ainsi que le protocole de leur incorporation doit être judicieusement étudié. Dans le présent travail, un nouveau protocole a été mis en place pour un marquage AbId1(LV)proS optimisé, exempt de toutes fuites isotopiques. En comparaison avec le marquage “AbId1LV standard", le modèle proposé permet la diminution d'un facteur de 2 le nombre de signaux RMN des Leu et Val et améliore par un facteur de 4 l'intensité des nOes à long portée qui sont expérimentalement détectable. Par ailleurs, ce protocole permet également la suppression des corrélations parasites, particulièrement nocives pour les études structurales basées sur la détection / analyse de nOes. Afin d'exploiter les spectres RMN obtenus en utilisant le protocole ci-dessus mentionné, l'attribution des signaux des méthyles est obligatoire. Deux stratégies ont été donc proposées. La première s'applique aux systèmes dont le poids moléculaire ne dépasse pas les 100 kDa (e.g. MSG). Elle se repose sur la linéarisation du marquage isotopique des acides aminés permettant ainsi l'utilisation de l'expérience 13C-TOCSY pour attribuer de manière régio et stéréo-spécifique les méthyles de l'isoleucine, leucine et valine en une seule étape. En ce qui concerne la seconde, adaptée aux protéines supra-moléculaire (> 100 kDa), c'est une optimisation de l'approche SeSAM (Sequence-Specific Assignment of Methyl groups by Mutagenesis) précédemment décrite dans la littérature. En effet, grâce au milieu de culture enrichi, mit au point pour le marquage spécifique de l'Ala, le volume minimal de culture requis a été considérablement diminué. Ceci a permis par conséquence de produire les protéines dans des plaques de 24 puits et de les purifier dans des plaques de 96 puits, raccourcissant ainsi le temps global de préparation des échantillons. Il a été estimé que l'utilisation de cette version améliorée de SeSAM offre la possibilité d'attribuer environ 100 méthyles en 2 semaines, dont 4 jours de temps de RMN, en consommant moins de 2 k € de matériaux isotopiques. Pour illustrer la pertinence du marquage isotopique et la protonation sélectifs des méthyles, de façon combinée ou pas, de nombreuses applications ont été présentés, à savoir l'étude en temps réel des processus d'auto-assemblage d'une protéine supramoléculaire (PhTET-2, ~ 0,5 MDa) par RMN. Le marquage combinatoire des protéines (82 kDa et 0,5 MDa) pour la détection de nOes longue portée (jusqu'à 10 Å et 8 Å respectivement) a également été étudié. Cette même approche a également été utilisée pour le filtrage de nOes inter-monomères à long portée, qui sont particulièrement importants pour le calcul de structure, dans des systèmes symétrique et homo-oligomèriques (PhTET-2)

    Développement de stratégies de marquage isotopique des groupements méthyles pour l'étude d'assemblages protéiques de grande taille par RMN

    Get PDF
    Solution NMR spectroscopy has been limited to small biological objects for a long time. Nowadays, it is unequivocally recognized that the strategy of specific isotope labeling of methyl groups in a perdeuterated protein has significantly extended the frontier of this technique. Indeed, proteins as large as 1 MDa could be investigated by NMR. Conversely, this strategy presents an important drawback consisting of the drastically reduced number of protonated probes. The project of this thesis falls within the framework of developing new methodologies to cope with this scarce structural information, relying on the simultaneous labeling of several methyl groups to increase the number of probes. For optimized combinatorial labeling, the choice of the ensemble of amino acids to label simultaneously and the precursors as well as the protocol for their incorporation have to be carefully studied. In this work, a new protocol was introduced for the scrambling-free and optimized isotopic labeling of AbId1(LV)proS methyl groups. In comparison to the “standard AbId1LV” labeling scheme, the proposed pattern induces a 2-fold decrease of number of Leu and Val NMR signals and enhances the intensity of detectable long-range nOes by a factor 4. The described protocol also permits the suppression of spurious correlations, especially harmful for structural studies based on detection/analysis of nOes. To make an efficient use of the obtained high quality NMR spectra using this protocol, assignment of the methyl groups signals is mandatory. Two strategies were then proposed. The first is suitable for systems whose molecular weight does not exceed 100 kDa. It relies on the use of isotopically linearized precursors (with different isotope topologies to discriminate each methyl group) to assign in a regio- and stereo-specific manner the isoleucine, leucine and valine methyl groups in a single step, employing an optimized “out and back” 13C-TOCSY pulse sequence. While the second, adapted to supra-molecular proteins (> 100 kDa), consists of optimizing the previously reported SeSAM approach (Sequence-Specific Assignment of Methyl groups by Mutagenesis). Indeed, thanks to the developed enriched culture medium for the specific labeling of Ala, the minimal required culture volume was significantly decreased, enabling the proteins expression in 24 well plates and their parallel purification in 96 well plates. This improved SeSAM version was estimated to allow the assignment of ca. 100 methyl cross-peaks in 2 weeks, including 4 days of NMR time and less than 2 k€ of isotopic materials. To illustrate the pertinence of using selectively protonated methyl groups, either in a single or combined fashion, several applications were presented, namely the real-time NMR study of self-assembly process of a ~0.5 MDa supra-molecular protein (PhTET-2). The use of combinatorial labeling for the detection of long-range nOes to up to 10 Å (8 Å) in proteins of 82 kDa (respectively 0.5 MDa) was also investigated. This same approach was exploited for the filtering of inter-monomeric long-range nOes in the same symmetrical and homo-oligomeric PhTET-2 protein.Durant longtemps la spectroscopie RMN en solution a été limitée à de petits objets biologiques. Aujourd'hui, il est clairement reconnu que la stratégie du marquage isotopique spécifique de groupements méthyle dans une protéine perdeutérée a considérablement repoussé la frontière de cette technique. En effet, des protéines aussi grande que 1 MDa ont pu être récemment étudiées par RMN. Cependant, cette stratégie présente un inconvénient important lié au nombre réduit de sondes protonées. Dans ce contexte, le projet de thèse vise a développer de nouvelles méthodes pour faire face à cette rareté d'information structurale, en s'appuyant sur le marquage simultané (combinatoire) de plusieurs groupements méthyle afin d'augmenter le nombre de sondes. Pour un marquage combinatoire optimal, le choix des acides aminés à marquer et les précurseurs à utiliser ainsi que le protocole de leur incorporation doit être judicieusement étudié. Dans le présent travail, un nouveau protocole a été mis en place pour un marquage AbId1(LV)proS optimisé, exempt de toutes fuites isotopiques. En comparaison avec le marquage “AbId1LV standard", le modèle proposé permet la diminution d'un facteur de 2 le nombre de signaux RMN des Leu et Val et améliore par un facteur de 4 l'intensité des nOes à long portée qui sont expérimentalement détectable. Par ailleurs, ce protocole permet également la suppression des corrélations parasites, particulièrement nocives pour les études structurales basées sur la détection / analyse de nOes. Afin d'exploiter les spectres RMN obtenus en utilisant le protocole ci-dessus mentionné, l'attribution des signaux des méthyles est obligatoire. Deux stratégies ont été donc proposées. La première s'applique aux systèmes dont le poids moléculaire ne dépasse pas les 100 kDa (e.g. MSG). Elle se repose sur la linéarisation du marquage isotopique des acides aminés permettant ainsi l'utilisation de l'expérience 13C-TOCSY pour attribuer de manière régio et stéréo-spécifique les méthyles de l'isoleucine, leucine et valine en une seule étape. En ce qui concerne la seconde, adaptée aux protéines supra-moléculaire (> 100 kDa), c'est une optimisation de l'approche SeSAM (Sequence-Specific Assignment of Methyl groups by Mutagenesis) précédemment décrite dans la littérature. En effet, grâce au milieu de culture enrichi, mit au point pour le marquage spécifique de l'Ala, le volume minimal de culture requis a été considérablement diminué. Ceci a permis par conséquence de produire les protéines dans des plaques de 24 puits et de les purifier dans des plaques de 96 puits, raccourcissant ainsi le temps global de préparation des échantillons. Il a été estimé que l'utilisation de cette version améliorée de SeSAM offre la possibilité d'attribuer environ 100 méthyles en 2 semaines, dont 4 jours de temps de RMN, en consommant moins de 2 k € de matériaux isotopiques. Pour illustrer la pertinence du marquage isotopique et la protonation sélectifs des méthyles, de façon combinée ou pas, de nombreuses applications ont été présentés, à savoir l'étude en temps réel des processus d'auto-assemblage d'une protéine supramoléculaire (PhTET-2, ~ 0,5 MDa) par RMN. Le marquage combinatoire des protéines (82 kDa et 0,5 MDa) pour la détection de nOes longue portée (jusqu'à 10 Å et 8 Å respectivement) a également été étudié. Cette même approche a également été utilisée pour le filtrage de nOes inter-monomères à long portée, qui sont particulièrement importants pour le calcul de structure, dans des systèmes symétrique et homo-oligomèriques (PhTET-2)

    New Approaches to Protein NMR Automation

    Get PDF
    The three-dimensional structure of a protein molecule is the key to understanding its biological and physiological properties. A major problem in bioinformatics is to efficiently determine the three-dimensional structures of query proteins. Protein NMR structure de- termination is one of the main experimental methods and is comprised of: (i) protein sample production and isotope labelling, (ii) collecting NMR spectra, and (iii) analysis of the spectra to produce the protein structure. In protein NMR, the three-dimensional struc- ture is determined by exploiting a set of distance restraints between spatially proximate atoms. Currently, no practical automated protein NMR method exists that is without human intervention. We first propose a complete automated protein NMR pipeline, which can efficiently be used to determine the structures of moderate sized proteins. Second, we propose a novel and efficient semidefinite programming-based (SDP) protein structure determination method. The proposed automated protein NMR pipeline consists of three modules: (i) an automated peak picking method, called PICKY, (ii) a backbone chemical shift assign- ment method, called IPASS, and (iii) a protein structure determination method, called FALCON-NMR. When tested on four real protein data sets, this pipeline can produce structures with reasonable accuracies, starting from NMR spectra. This general method can be applied to other macromolecule structure determination methods. For example, a promising application is RNA NMR-assisted secondary structure determination. In the second part of this thesis, due to the shortcomings of FALCON-NMR, we propose a novel SDP-based protein structure determination method from NMR data, called SPROS. Most of the existing prominent protein NMR structure determination methods are based on molecular dynamics coupled with a simulated annealing schedule. In these methods, an objective function representing the error between observed and given distance restraints is minimized; these objective functions are highly non-convex and difficult to optimize. Euclidean distance geometry methods based on SDP provide a natural formulation for realizing a three-dimensional structure from a set of given distance constraints. However, the complexity of the SDP solvers increases cubically with the input matrix size, i.e., the number of atoms in the protein, and the number of constraints. In fact, the complexity of SDP solvers is a major obstacle in their applicability to the protein NMR problem. To overcome these limitations, the SPROS method models the protein molecule as a set of intersecting two- and three-dimensional cliques. We adapt and extend a technique called semidefinite facial reduction for the SDP matrix size reduction, which makes the SDP problem size approximately one quarter of the original problem. The reduced problem is solved nearly one hundred times faster and is more robust against numerical problems. Reasonably accurate results were obtained when SPROS was applied to a set of 20 real protein data sets
    corecore