556 research outputs found

    Graph algorithms for NMR resonance assignment and cross-link experiment planning

    Get PDF
    The study of three-dimensional protein structures produces insights into protein function at the molecular level. Graphs provide a natural representation of protein structures and associated experimental data, and enable the development of graph algorithms to analyze the structures and data. This thesis develops such graph representations and algorithms for two novel applications: structure-based NMR resonance assignment and disulfide cross-link experiment planning for protein fold determination. The first application seeks to identify correspondences between spectral peaks in NMR data and backbone atoms in a structure (from x-ray crystallography or homology modeling), by computing correspondences between a contact graph representing the structure and an analogous but very noisy and ambiguous graph representing the data. The assignment then supports further NMR studies of protein dynamics and protein-ligand interactions. A hierarchical grow-and-match algorithm was developed for smaller assignment problems, ensuring completeness of assignment, while a random graph approach was developed for larger problems, provably determining unique matches in polynomial time with high probability. Test results show that our algorithms are robust to typical levels of structural variation, noise, and missings, and achieve very good overall assignment accuracy. The second application aims to rapidly determine the overall organization of secondary structure elements of a target protein by probing it with a set of planned disulfide cross-links. A set of informative pairs of secondary structure elements is selected from graphs representing topologies of predicted structure models. For each pair in this ``fingerprint\u27\u27, a set of informative disulfide probes is selected from graphs representing residue proximity in the models. Information-theoretic planning algorithms were developed to maximize information gain while minimizing experimental complexity, and Bayes error plan assessment frameworks were developed to characterize the probability of making correct decisions given experimental data. Evaluation of the approach on a number of structure prediction case studies shows that the optimized plans have low risk of error while testing only a very small portion of the quadratic number of possible cross-link candidates

    Beyond structural genomics: computational approaches for the identification of ligand binding sites in protein structures

    Get PDF
    t Structural genomics projects have revealed structures for a large number of proteins of unknown function. Understanding the interactions between these proteins and their ligands would provide an initial step in their functional characterization. Binding site identification methods are a fast and cost-effective way to facilitate the characterization of functionally important protein regions. In this review we describe our recently developed methods for binding site identification in the context of existing methods. The advantage of energy-based approaches is emphasized, since they provide flexibility in the identifi- cation and characterization of different types of binding site

    PAR-3D: a server to predict protein active site residues

    Get PDF
    PAR-3D (http://sunserver.cdfd.org.in:8080/protease/PAR_3D/index.html) is a web-based tool that exploits the fact that relative juxtaposition of active site residues is a conserved feature in functionally related protein families. The server uses previously calculated and stored values of geometrical parameters of a set of known proteins (training set) for prediction of active site residues in a query protein structure. PAR-3D stores motifs for different classes of proteases, the ten glycolytic pathway enzymes and metal-binding sites. The server accepts the structures in the pdb format. The first step during the prediction is the extraction of probable active site residues from the query structure. Spatial arrangement of the probable active site residues is then determined in terms of geometrical parameters. These are compared with stored geometries of the different motifs. Its speed and efficiency make it a beneficial tool for structural genomics projects, especially when the biochemical function of the protein has not been characterized

    Graph-based Approaches to Protein Structure- and Function Prediction

    No full text

    Computational Molecular Biology

    No full text
    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

    A comprehensive analysis of 40 blind protein structure predictions

    Get PDF
    BACKGROUND: We thoroughly analyse the results of 40 blind predictions for which an experimental answer was made available at the fourth meeting on the critical assessment of protein structure methods (CASP4). Using our comparative modelling and fold recognition methodologies, we made 29 predictions for targets that had sequence identities ranging from 50% to 10% to the nearest related protein with known structure. Using our ab initio methodologies, we made eleven predictions for targets that had no detectable sequence relationships. RESULTS: For 23 of these proteins, we produced models ranging from 1.0 to 6.0 ƅ root mean square deviation (RMSD) for the C(Ī±) atoms between the model and the corresponding experimental structure for all or large parts of the protein, with model accuracies scaling fairly linearly with respect to sequence identity (i.e., the higher the sequence identity, the better the prediction). We produced nine models with accuracies ranging from 4.0 to 6.0 ƅ C(Ī±) RMSD for 60ā€“100 residue proteins (or large fragments of a protein), with a prediction accuracy of 4.0 ƅ C(Ī±) RMSD for residues 1ā€“80 for T110/rbfa. CONCLUSIONS: The areas of protein structure prediction that work well, and areas that need improvement, are discernable by examining how our methods have performed over the past four CASP experiments. These results have implications for modelling the structure of all tractable proteins encoded by the genome of an organism

    Knowledge-based potentials in protein fold recognition

    Get PDF
    An accurate potential function is essential for protein folding problem and structure prediction. Two different types of potential energy functions are currently in use. The first type is based on the law of physics and second type is referred to as statistical potentials or knowledge based potentials.Ā  In the latter type, the energy function is extracted from statistical analysis of experimental data of known protein structures. By increasing the amount of three dimensional protein structures, this approach is growing rapidly.There are various forms of knowledge based potentials depending on how statistics are calculated and how proteins are modeled. In this review, we explain how the knowledge based potentials are extracted by using known protein structures and briefly compare many of the potentials in theory
    • ā€¦
    corecore