1,167 research outputs found

    Graph algorithms for NMR resonance assignment and cross-link experiment planning

    Get PDF
    The study of three-dimensional protein structures produces insights into protein function at the molecular level. Graphs provide a natural representation of protein structures and associated experimental data, and enable the development of graph algorithms to analyze the structures and data. This thesis develops such graph representations and algorithms for two novel applications: structure-based NMR resonance assignment and disulfide cross-link experiment planning for protein fold determination. The first application seeks to identify correspondences between spectral peaks in NMR data and backbone atoms in a structure (from x-ray crystallography or homology modeling), by computing correspondences between a contact graph representing the structure and an analogous but very noisy and ambiguous graph representing the data. The assignment then supports further NMR studies of protein dynamics and protein-ligand interactions. A hierarchical grow-and-match algorithm was developed for smaller assignment problems, ensuring completeness of assignment, while a random graph approach was developed for larger problems, provably determining unique matches in polynomial time with high probability. Test results show that our algorithms are robust to typical levels of structural variation, noise, and missings, and achieve very good overall assignment accuracy. The second application aims to rapidly determine the overall organization of secondary structure elements of a target protein by probing it with a set of planned disulfide cross-links. A set of informative pairs of secondary structure elements is selected from graphs representing topologies of predicted structure models. For each pair in this ``fingerprint\u27\u27, a set of informative disulfide probes is selected from graphs representing residue proximity in the models. Information-theoretic planning algorithms were developed to maximize information gain while minimizing experimental complexity, and Bayes error plan assessment frameworks were developed to characterize the probability of making correct decisions given experimental data. Evaluation of the approach on a number of structure prediction case studies shows that the optimized plans have low risk of error while testing only a very small portion of the quadratic number of possible cross-link candidates

    Contact replacement for NMR resonance assignment

    Get PDF
    Motivation: Complementing its traditional role in structural studies of proteins, nuclear magnetic resonance (NMR) spectroscopy is playing an increasingly important role in functional studies. NMR dynamics experiments characterize motions involved in target recognition, ligand binding, etc., while NMR chemical shift perturbation experiments identify and localize protein–protein and protein–ligand interactions. The key bottleneck in these studies is to determine the backbone resonance assignment, which allows spectral peaks to be mapped to specific atoms. This article develops a novel approach to address that bottleneck, exploiting an available X-ray structure or homology model to assign the entire backbone from a set of relatively fast and cheap NMR experiments

    NOEnet–Use of NOE networks for NMR resonance assignment of proteins with known 3D structure

    Get PDF
    Motivation: A prerequisite for any protein study by NMR is the assignment of the resonances from the 15N−1H HSQC spectrum to their corresponding atoms of the protein backbone. Usually, this assignment is obtained by analyzing triple resonance NMR experiments. An alternative assignment strategy exploits the information given by an already available 3D structure of the same or a homologous protein. Up to now, the algorithms that have been developed around the structure-based assignment strategy have the important drawbacks that they cannot guarantee a high assignment accuracy near to 100%

    Algorithms for automated assignment of solution-state and solid-state protein NMR spectra.

    Get PDF
    Protein nuclear magnetic resonance spectroscopy (Protein NMR) is an invaluable analytical technique for studying protein structure, function, and dynamics. There are two major types of NMR spectroscopy that are used for investigation of protein structure – solution-state and solid-state NMR. Solution-based NMR spectroscopy is typically applied to proteins of small and medium size that are soluble in water. Solid-state NMR spectroscopy is amenable for proteins that are insoluble in water. In the vast majority NMR-based protein studies, the first step after experiment optimization is the assignment of protein resonances via the association of chemical shift values to specific atoms in a protein macromolecule. Depending on the quality of the spectra, a manual protein resonance assignment process often requires a considerable amount of time, from weeks to months-worth of effort even, by an experienced NMR spectroscopist . The resonance assignment processes for solution-state and solid-state protein NMR studies are conceptually similar, but have distinct differences due to the utilization of different NMR experiments and to the use of different resonances for grouping peaks into spin systems. Currently, there is a shortage of robust, effective software tools that can perform solid-state protein resonance assignment and there is no general software that can perform both solution-state and solid-state protein resonance assignment in a reliable, automated fashion. Hence, the motivation of this research is to design and implement algorithms and software tools that will automate the resonance assignment problem. As a result of this research, several algorithms and software packages that aid several important steps in the protein resonance assignment process were developed. For example, the nmrstarlib software package can access and utilize data deposited in the NMR-STAR format; the core of this library is the lexical analyzer for NMR-STAR syntax that acts as a generator-based state-machine for token processing. The jpredapi software package provides an easy-to-use API to submit and retrieve results from secondary structure prediction server. The single peak list and pairwise peak list registration algorithms address the problem of multiple sources of variance within single peak list and between different peak lists and is capable of calculating the match tolerance values necessary for spin system grouping. The single peak list and pairwise peak list grouping algorithms are based on the well-known DBSCAN clustering algorithm and are designed to group peaks into spin systems within single peak list as well as between different peak lists

    Robust structure-based resonance assignment for functional protein studies by NMR

    Get PDF
    High-throughput functional protein NMR studies, like protein interactions or dynamics, require an automated approach for the assignment of the protein backbone. With the availability of a growing number of protein 3D structures, a new class of automated approaches, called structure-based assignment, has been developed quite recently. Structure-based approaches use primarily NMR input data that are not based on J-coupling and for which connections between residues are not limited by through bonds magnetization transfer efficiency. We present here a robust structure-based assignment approach using mainly HN–HN NOEs networks, as well as 1H–15N residual dipolar couplings and chemical shifts. The NOEnet complete search algorithm is robust against assignment errors, even for sparse input data. Instead of a unique and partly erroneous assignment solution, an optimal assignment ensemble with an accuracy equal or near to 100% is given by NOEnet. We show that even low precision assignment ensembles give enough information for functional studies, like modeling of protein-complexes. Finally, the combination of NOEnet with a low number of ambiguous J-coupling sequential connectivities yields a high precision assignment ensemble. NOEnet will be available under: http://www.icsn.cnrs-gif.fr/download/nmr

    Fast and Robust Mathematical Modeling of NMR Assignment Problems

    Get PDF
    NMR spectroscopy is not only for protein structure determination, but also for drug screening and studies of dynamics and interactions. In both cases, one of the main bottleneck steps is backbone assignment. When a homologous structure is available, it can accelerate assignment. Such structure-based methods are the focus of this thesis. This thesis aims for fast and robust methods for NMR assignment problems; in particular, structure-based backbone assignment and chemical shift mapping. For speed, we identified situations where the number of 15N-labeled experiments for structure-based assignment can be reduced; in particular, when a homologous assignment or chemical shift mapping information is available. For robustness, we modeled and directly addressed the errors. Binary integer linear programming, a well-studied method in operations research, was used to model the problems and provide practically efficient solutions with optimality guarantees. Our approach improved on the most robust method for structure-based backbone assignment on 15N-labeled data by improving the accuracy by 10% on average on 9 proteins, and then by handling typing errors, which had previously been ignored. We show that such errors can have a large impact on the accuracy; decreasing the accuracy from 95% or greater to between 40% and 75%. On automatically picked peaks, which is much noisier than manually picked peaks, we achieved an accuracy of 97% on ubiquitin. In chemical shift mapping, the peak tracking is often done manually because the problem is inherently visual. We developed a computer vision approach for tracking the peak movements with average accuracy of over 95% on three proteins with less than 1.5 residues predicted per peak. One of the proteins tested is larger than any tested by existing automated methods, and it has more titration peak lists. We then combined peak tracking with backbone assignment to take into account contact information, which resulted in an average accuracy of 94% on one-to-one assignments for these three proteins. Finally, we applied peak tracking and backbone assignment to protein-ligand docking to illustrate the potential for fast 3D complex determination

    Advances in Nuclear Magnetic Resonance for Drug Discovery

    Get PDF
    Background—Drug discovery is a complex and unpredictable endeavor with a high failure rate. Current trends in the pharmaceutical industry have exasperated these challenges and are contributing to the dramatic decline in productivity observed over the last decade. The industrialization of science by forcing the drug discovery process to adhere to assembly-line protocols is imposing unnecessary restrictions, such as short project time-lines. Recent advances in nuclear magnetic resonance are responding to these self-imposed limitations and are providing opportunities to increase the success rate of drug discovery. Objective/Method—A review of recent advancements in NMR technology that have the potential of significantly impacting and benefiting the drug discovery process will be presented. These include fast NMR data collection protocols and high-throughput protein structure determination, rapid protein-ligand co-structure determination, lead discovery using fragment-based NMR affinity screens, NMR metabolomics to monitor in vivo efficacy and toxicity for lead compounds, and the identification of new therapeutic targets through the functional annotation of proteins by FASTNMR. Conclusion—NMR is a critical component of the drug discovery process, where the versatility of the technique enables it to continually expand and evolve its role. NMR is expected to maintain this growth over the next decade with advancements in automation, speed of structure calculation, incell imaging techniques, and the expansion of NMR amenable targets

    Molecular-level organization of coassembled β-sheet peptide nanofibers

    Get PDF
    Functional biomaterials that recapitulate the complexity and sophistication of biological systems can be difficult to access given current techniques. One promising route towards building biomaterials with controlled nanoscale organization is coassembling β-sheet peptides. Coassembling β-sheet peptide designs are predominated by the concept of charge complementarity in which the two peptide sequences are modified to include charged amino acids giving rise to either an overall positive or overall negative charge. Electrostatic repulsion prevents self-assembly while attraction between oppositely charged peptides promotes β-sheet assembly. While previous studies have assessed the secondary structure of nanofibers fabricated from charge-complementary peptides, there is no detailed molecular-level description of how these peptides strands arrange within the nanofiber. Consequently, we lack an understanding of how these peptides coassemble and how to design the sequences to form a specific coassembled structure. In this thesis, we investigate the molecular-level organization within coassembling β-sheet peptide nanofibers by a combination of experimental and computational techniques. Results reveal a significant number of structural defects are formed highlighting the challenge in designing coassembling β-sheet peptides and providing insights into future designs.Ph.D

    New Approaches to Protein NMR Automation

    Get PDF
    The three-dimensional structure of a protein molecule is the key to understanding its biological and physiological properties. A major problem in bioinformatics is to efficiently determine the three-dimensional structures of query proteins. Protein NMR structure de- termination is one of the main experimental methods and is comprised of: (i) protein sample production and isotope labelling, (ii) collecting NMR spectra, and (iii) analysis of the spectra to produce the protein structure. In protein NMR, the three-dimensional struc- ture is determined by exploiting a set of distance restraints between spatially proximate atoms. Currently, no practical automated protein NMR method exists that is without human intervention. We first propose a complete automated protein NMR pipeline, which can efficiently be used to determine the structures of moderate sized proteins. Second, we propose a novel and efficient semidefinite programming-based (SDP) protein structure determination method. The proposed automated protein NMR pipeline consists of three modules: (i) an automated peak picking method, called PICKY, (ii) a backbone chemical shift assign- ment method, called IPASS, and (iii) a protein structure determination method, called FALCON-NMR. When tested on four real protein data sets, this pipeline can produce structures with reasonable accuracies, starting from NMR spectra. This general method can be applied to other macromolecule structure determination methods. For example, a promising application is RNA NMR-assisted secondary structure determination. In the second part of this thesis, due to the shortcomings of FALCON-NMR, we propose a novel SDP-based protein structure determination method from NMR data, called SPROS. Most of the existing prominent protein NMR structure determination methods are based on molecular dynamics coupled with a simulated annealing schedule. In these methods, an objective function representing the error between observed and given distance restraints is minimized; these objective functions are highly non-convex and difficult to optimize. Euclidean distance geometry methods based on SDP provide a natural formulation for realizing a three-dimensional structure from a set of given distance constraints. However, the complexity of the SDP solvers increases cubically with the input matrix size, i.e., the number of atoms in the protein, and the number of constraints. In fact, the complexity of SDP solvers is a major obstacle in their applicability to the protein NMR problem. To overcome these limitations, the SPROS method models the protein molecule as a set of intersecting two- and three-dimensional cliques. We adapt and extend a technique called semidefinite facial reduction for the SDP matrix size reduction, which makes the SDP problem size approximately one quarter of the original problem. The reduced problem is solved nearly one hundred times faster and is more robust against numerical problems. Reasonably accurate results were obtained when SPROS was applied to a set of 20 real protein data sets
    corecore