1,035 research outputs found

    Automated protein structure calculation from NMR data

    Get PDF
    Current software is almost at the stage to permit completely automatic structure determination of small proteins of < 15 kDa, from NMR spectra to structure validation with minimal user interaction. This goal is welcome, as it makes structure calculation more objective and therefore more easily validated, without any loss in the quality of the structures generated. Moreover, it releases expert spectroscopists to carry out research that cannot be automated. It should not take much further effort to extend automation to ca 20 kDa. However, there are technological barriers to further automation, of which the biggest are identified as: routines for peak picking; adoption and sharing of a common framework for structure calculation, including the assembly of an automated and trusted package for structure validation; and sample preparation, particularly for larger proteins. These barriers should be the main target for development of methodology for protein structure determination, particularly by structural genomics consortia

    NightShift: NMR shift inference by general hybrid model training - a framework for NMR chemical shift prediction

    Get PDF
    BACKGROUND: NMR chemical shift prediction plays an important role in various applications in computational biology. Among others, structure determination, structure optimization, and the scoring of docking results can profit from efficient and accurate chemical shift estimation from a three-dimensional model. A variety of NMR chemical shift prediction approaches have been presented in the past, but nearly all of these rely on laborious manual data set preparation and the training itself is not automatized, making retraining the model, e.g., if new data is made available, or testing new models a time-consuming manual chore. RESULTS: In this work, we present the framework NightShift (NMR Shift Inference by General Hybrid Model Training), which enables automated data set generation as well as model training and evaluation of protein NMR chemical shift prediction. In addition to this main result – the NightShift framework itself – we describe the resulting, automatically generated, data set and, as a proof-of-concept, a random forest model called Spinster that was built using the pipeline. CONCLUSION: By demonstrating that the performance of the automatically generated predictors is at least en par with the state of the art, we conclude that automated data set and predictor generation is well-suited for the design of NMR chemical shift estimators. The framework can be downloaded from https://bitbucket.org/akdehof/nightshift. It requires the open source Biochemical Algorithms Library (BALL), and is available under the conditions of the GNU Lesser General Public License (LGPL). We additionally offer a browser-based user interface to our NightShift instance employing the Galaxy framework via https://ballaxy.bioinf.uni-sb.de/

    Structural Biology by NMR: Structure, Dynamics, and Interactions

    Get PDF
    The function of bio-macromolecules is determined by both their 3D structure and conformational dynamics. These molecules are inherently flexible systems displaying a broad range of dynamics on time-scales from picoseconds to seconds. Nuclear Magnetic Resonance (NMR) spectroscopy has emerged as the method of choice for studying both protein structure and dynamics in solution. Typically, NMR experiments are sensitive both to structural features and to dynamics, and hence the measured data contain information on both. Despite major progress in both experimental approaches and computational methods, obtaining a consistent view of structure and dynamics from experimental NMR data remains a challenge. Molecular dynamics simulations have emerged as an indispensable tool in the analysis of NMR data

    Semidefinite Programming Approach for the Quadratic Assignment Problem with a Sparse Graph

    Full text link
    The matching problem between two adjacency matrices can be formulated as the NP-hard quadratic assignment problem (QAP). Previous work on semidefinite programming (SDP) relaxations to the QAP have produced solutions that are often tight in practice, but such SDPs typically scale badly, involving matrix variables of dimension n2n^2 where n is the number of nodes. To achieve a speed up, we propose a further relaxation of the SDP involving a number of positive semidefinite matrices of dimension O(n)\mathcal{O}(n) no greater than the number of edges in one of the graphs. The relaxation can be further strengthened by considering cliques in the graph, instead of edges. The dual problem of this novel relaxation has a natural three-block structure that can be solved via a convergent Augmented Direction Method of Multipliers (ADMM) in a distributed manner, where the most expensive step per iteration is computing the eigendecomposition of matrices of dimension O(n)\mathcal{O}(n). The new SDP relaxation produces strong bounds on quadratic assignment problems where one of the graphs is sparse with reduced computational complexity and running times, and can be used in the context of nuclear magnetic resonance spectroscopy (NMR) to tackle the assignment problem.Comment: 31 page

    Towards Automating Protein Structure Determination from NMR Data

    Get PDF
    Nuclear magnetic resonance (NMR) spectroscopy technique is becoming exceedingly significant due to its capability of studying protein structures in solution. However, NMR protein structure determination has remained a laborious and costly process until now, even with the help of currently available computer programs. After the NMR spectra are collected, the main road blocks to the fully automated NMR protein structure determination are peak picking from noisy spectra, resonance assignment from imperfect peak lists, and structure calculation from incomplete assignment and ambiguous nuclear Overhauser enhancements (NOE) constraints. The goal of this dissertation is to propose error-tolerant and highly-efficient methods that work well on real and noisy data sets of NMR protein structure determination and the closely related protein structure prediction problems. One major contribution of this dissertation is to propose a fully automated NMR protein structure determination system, AMR, with emphasis on the parts that I contributed. AMR only requires an input set with six NMR spectra. We develop a novel peak picking method, PICKY, to solve the crucial but tricky peak picking problem. PICKY consists of a noise level estimation step, a component forming step, a singular value decomposition-based initial peak picking step, and a peak refinement step. The first systematic study on peak picking problem is conducted to test the performance of PICKY. An integer linear programming (ILP)-based resonance assignment method, IPASS, is then developed to handle the imperfect peak lists generated by PICKY. IPASS contains an error-tolerant spin system forming method and an ILP-based assignment method. The assignment generated by IPASS is fed into the structure calculation step, FALCON-NMR. FALCON-NMR has a threading module, an ab initio module, an all-atom refinement module, and an NOE constraints-based decoy selection module. The entire system, AMR, is successfully tested on four out of five real proteins with practical NMR spectra, and generates 1.25A, 1.49A, 0.67A, and 0.88A to the native reference structures, respectively. Another contribution of this dissertation is to propose novel ideas and methods to solve three protein structure prediction problems which are closely related to NMR protein structure determination. We develop a novel consensus contact prediction method, which is able to eliminate server correlations, to solve the protein inter-residue contact prediction problem. We also propose an ultra-fast side chain packing method, which only uses local backbone information, to solve the protein side chain packing problem. Finally, two complementary local quality assessment methods are proposed to solve the local quality prediction problem for comparative modeling-based protein structure prediction methods

    Algorithms for automated assignment of solution-state and solid-state protein NMR spectra.

    Get PDF
    Protein nuclear magnetic resonance spectroscopy (Protein NMR) is an invaluable analytical technique for studying protein structure, function, and dynamics. There are two major types of NMR spectroscopy that are used for investigation of protein structure – solution-state and solid-state NMR. Solution-based NMR spectroscopy is typically applied to proteins of small and medium size that are soluble in water. Solid-state NMR spectroscopy is amenable for proteins that are insoluble in water. In the vast majority NMR-based protein studies, the first step after experiment optimization is the assignment of protein resonances via the association of chemical shift values to specific atoms in a protein macromolecule. Depending on the quality of the spectra, a manual protein resonance assignment process often requires a considerable amount of time, from weeks to months-worth of effort even, by an experienced NMR spectroscopist . The resonance assignment processes for solution-state and solid-state protein NMR studies are conceptually similar, but have distinct differences due to the utilization of different NMR experiments and to the use of different resonances for grouping peaks into spin systems. Currently, there is a shortage of robust, effective software tools that can perform solid-state protein resonance assignment and there is no general software that can perform both solution-state and solid-state protein resonance assignment in a reliable, automated fashion. Hence, the motivation of this research is to design and implement algorithms and software tools that will automate the resonance assignment problem. As a result of this research, several algorithms and software packages that aid several important steps in the protein resonance assignment process were developed. For example, the nmrstarlib software package can access and utilize data deposited in the NMR-STAR format; the core of this library is the lexical analyzer for NMR-STAR syntax that acts as a generator-based state-machine for token processing. The jpredapi software package provides an easy-to-use API to submit and retrieve results from secondary structure prediction server. The single peak list and pairwise peak list registration algorithms address the problem of multiple sources of variance within single peak list and between different peak lists and is capable of calculating the match tolerance values necessary for spin system grouping. The single peak list and pairwise peak list grouping algorithms are based on the well-known DBSCAN clustering algorithm and are designed to group peaks into spin systems within single peak list as well as between different peak lists
    • …
    corecore