232 research outputs found

    A Tabu search approach for the nuclear magnetic resonance protein structure based assignment problem

    Get PDF
    Nuclear Magnetic Resonance (NMR) Spectroscopy is an experimental technique which exploits the magnetic properties of specific nuclei and enables the study of proteins in solution. The key bottleneck of NMR studies is to map the NMR peaks to corresponding nuclei, also known as the assignment problem. Structure Based Assignment (SBA) is an approach to solve this computationally challenging problem by using prior information about the protein obtained from a homologous structure. [17] used the Nuclear Vector Replacement (NVR) [29] framework to model SBA as a binary integer programming problem (NVR-BIP). In this thesis, we prove that this problem is NP-hard and propose a tabu search algorithm (NVR-TS) equipped with a guided perturbation mechanism to efficiently solve it. NVR-TS uses a quadratic penalty relaxation of NVR-BIP where the violations in the Nuclear Overhauser Effect constraints are penalized in the objective function. Experimental results indicate that our algorithm finds the optimal solution on NVR-BIP's data set which consists of 7 proteins with 25 templates (31 to 126 residues). Furthermore, for two additional large proteins, MBP and EIN (348 and 243 residues, respectively) which NVR-BIP failed to solve, it achieves 91% and 41% assignment accuracies. The executable and the input files are available for download at http://people.sabanciuniv.edu/catay/NVR-TS/NVR-TS.html

    Automating the usage of unambiguous noes in nuclear vector replacement for NMR protein structure-based assignments

    Get PDF
    Proteins perform various functions and tasks in living organisms. The structure of a protein is essential in identifying the protein function. Therefore, determining the protein structure is of upmost importance. Nuclear Magnetic Resonance (NMR) is one of the experimental methods used to determine the protein structure. The key bottleneck in NMR protein structure determination is assigning NMR peaks to corresponding nuclei, which is known as the assignment problem. This assignment process is manually performed in many laboratories. In this thesis, we have developed methodologies and software to automate this process. The Structure Based Assignment (SBA) is an approach to solve this computationally challenging problem by using prior information about the protein that is obtained from a template structure. NVR-BIP is an approach that uses the Nuclear Vector Replacement (NVR) framework to model SBA as a binary integer programming problem. NVR-TS is a tabu search algorithm equipped with a guided perturbation mechanism to handle the proteins with larger residue numbers. NVR-ACO is an ant colony optimization approach that is inspired by the behavior of living ants to minimize peak-nuclei matching cost. One of the input data utilized in these approaches is the Nuclear Overhauser Effect (NOE) data. NOE is an interaction observed between two protons if the protons are located close in space. These protons could be amide protons (HN), protons attached to the alpha-carbon atom in the backbone of the protein (HA), or side chain protons. NVR only uses backbone protons. In the previous approaches using the NVR framework, the proton type was not distinguished in the NOEs and only the HN coordinates were used to incorporate the NOEs into the computation. In this thesis, we fix this problem and use both the HA and HN coordinates and the corresponding distances in our computations. In addition, in the previous studies within this context the distance threshold value for the NOEs was manually tuned for different proteins. However, this limits the application of the methodology for novel proteins. In this thesis we set the threshold value in a standard manner for all proteins by extracting the NOE upper bound distances from the data. Furthermore, for Maltose Binding Protein (MBP), we extract the NOE upper bound distances from the NMR peak intensity values directly and test this protein on real NMR data. We tested our approach on NVR-ACO's data set and compared our new approaches with NVR-BIP, NVR-TS, and NVR-ACO. The experimental results show that the proposed approach improves the assignment accuracies significantly. In particular, we achieved 100% assignment accuracy on EIN and 80% assignment accuracy on MBP proteins as compared to 83% and 73% accuracies, respectively, obtained in the previous approaches

    An improved formalism for assigning proteins using nuclear vector replacement framework

    Get PDF
    Tezin basılısı İstanbul Şehir Üniversitesi Kütüphanesi'ndedir.Proteins are macromolecules in living systems used in crucial functions in all biological processes. In order to understand the function of a protein it is necessary to determine the structure of it. There are various techniques to obtain structural information and Nuclear Magnetic Resonance (NMR) Spectroscopy is one of the most important ones. In this technique, an essential step is the backbone resonance assignment and Structure Based Assignment (SBA) is a method solving this problem with the help of a template structure. NVR is an NMR protein SBA program, that takes as input 15N and HN chemical shifts and unambiguous NOEs, as well as RDCs, HD-exchange and TOCSY data. To run NVR, there is a sequence of steps in obtaining the datales from NMR data and the template structure. In this study, the process of preparing these datales is simpliedandautomatized, whichisanimportantpracticalstepinrunningNVRonnovel proteins. A method to distinguish NH2 peaks from HSQC peaks is generated. Finally, rather than computing a single assignment, an ensemble of assignments is computed. Using this ensemble of assignment results, degree of reliability for individual peak-amino acid assignments is obtained and assignment accuracy is improved. The results show that these improvements bring NVR closer to a tool to be useful and practical tool, able to handle the input data automatically and analyze the reliability of assignments.Declaration of Authorship ii Abstract iii Öz iv Acknowledgments v List of Figures vii List of Tables viii Abbreviations ix 1 Introduction 1 2 Literature Review 6 2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 NVR Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Mathematical Formulation . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Template Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Methodology 10 3.1 Automatization of the Data Preparation . . . . . . . . . . . . . . . . . . . 10 3.2 Distinguishing NH2 peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Experimental Analysis of Distinguishing NH2 peaks . . . . . . . . 13 3.3 Providing a Measure of Reliability of Assignments . . . . . . . . . . . . . 14 4 Results 17 4.1 Automatization of the Data Preparation Results . . . . . . . . . . . . . . 17 4.1.1 Test Results on Two Novel Proteins . . . . . . . . . . . . . . . . . 17 4.2 Distinguishing NH2 peaks Results . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Reliability Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Conclusion 21 A NH2 Removal Scores of Randomly Selected Proteins 23 Bibliography 2

    Semidefinite Programming Approach for the Quadratic Assignment Problem with a Sparse Graph

    Full text link
    The matching problem between two adjacency matrices can be formulated as the NP-hard quadratic assignment problem (QAP). Previous work on semidefinite programming (SDP) relaxations to the QAP have produced solutions that are often tight in practice, but such SDPs typically scale badly, involving matrix variables of dimension n2n^2 where n is the number of nodes. To achieve a speed up, we propose a further relaxation of the SDP involving a number of positive semidefinite matrices of dimension O(n)\mathcal{O}(n) no greater than the number of edges in one of the graphs. The relaxation can be further strengthened by considering cliques in the graph, instead of edges. The dual problem of this novel relaxation has a natural three-block structure that can be solved via a convergent Augmented Direction Method of Multipliers (ADMM) in a distributed manner, where the most expensive step per iteration is computing the eigendecomposition of matrices of dimension O(n)\mathcal{O}(n). The new SDP relaxation produces strong bounds on quadratic assignment problems where one of the graphs is sparse with reduced computational complexity and running times, and can be used in the context of nuclear magnetic resonance spectroscopy (NMR) to tackle the assignment problem.Comment: 31 page

    05441 Abstracts Collection -- Managing and Mining Genome Information: Frontiers in Bioinformatics

    Get PDF
    From 30.10.05 to 04.11.05, the Dagstuhl Seminar 05441 ``Managing and Mining Genome Information: Frontiers in Bioinformatics\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Algorithms for Protein Structure Prediction

    Get PDF

    Operations research: from computational biology to sensor network

    Get PDF
    In this dissertation we discuss the deployment of combinatorial optimization methods for modeling and solve real life problemS, with a particular emphasis to two biological problems arising from a common scenario: the reconstruction of the three-dimensional shape of a biological molecule from Nuclear Magnetic Resonance (NMR) data. The fi rst topic is the 3D assignment pathway problem (APP) for a RNA molecule. We prove that APP is NP-hard, and show a formulation of it based on edge-colored graphs. Taking into account that interactions between consecutive nuclei in the NMR spectrum are diff erent according to the type of residue along the RNA chain, each color in the graph represents a type of interaction. Thus, we can represent the sequence of interactions as the problem of fi nding a longest (hamiltonian) path whose edges follow a given order of colors (i.e., the orderly colored longest path). We introduce three alternative IP formulations of APP obtained with a max flow problem on a directed graph with packing constraints over the partitions, which have been compared among themselves. Since the last two models work on cyclic graphs, for them we proposed an algorithm based on the solution of their relaxation combined with the separation of cycle inequalities in a Branch & Cut scheme. The second topic is the discretizable distance geometry problem (DDGP), which is a formulation on discrete search space of the well-known distance geometry problem (DGP). The DGP consists in seeking the embedding in the space of a undirected graph, given a set of Euclidean distances between certain pairs of vertices. DGP has two important applications: (i) fi nding the three dimensional conformation of a molecule from a subset of interatomic distances, called Molecular Distance Geometry Problem, and (ii) the Sensor Network Localization Problem. We describe a Branch & Prune (BP) algorithm tailored for this problem, and two versions of it solving the DDGP both in protein modeling and in sensor networks localization frameworks. BP is an exact and exhaustive combinatorial algorithm that examines all the valid embeddings of a given weighted graph G=(V,E,d), under the hypothesis of existence of a given order on V. By comparing the two version of BP to well-known algorithms we are able to prove the e fficiency of BP in both contexts, provided that the order imposed on V is maintained
    corecore