20 research outputs found

    New Approaches to Protein NMR Automation

    Get PDF
    The three-dimensional structure of a protein molecule is the key to understanding its biological and physiological properties. A major problem in bioinformatics is to efficiently determine the three-dimensional structures of query proteins. Protein NMR structure de- termination is one of the main experimental methods and is comprised of: (i) protein sample production and isotope labelling, (ii) collecting NMR spectra, and (iii) analysis of the spectra to produce the protein structure. In protein NMR, the three-dimensional struc- ture is determined by exploiting a set of distance restraints between spatially proximate atoms. Currently, no practical automated protein NMR method exists that is without human intervention. We first propose a complete automated protein NMR pipeline, which can efficiently be used to determine the structures of moderate sized proteins. Second, we propose a novel and efficient semidefinite programming-based (SDP) protein structure determination method. The proposed automated protein NMR pipeline consists of three modules: (i) an automated peak picking method, called PICKY, (ii) a backbone chemical shift assign- ment method, called IPASS, and (iii) a protein structure determination method, called FALCON-NMR. When tested on four real protein data sets, this pipeline can produce structures with reasonable accuracies, starting from NMR spectra. This general method can be applied to other macromolecule structure determination methods. For example, a promising application is RNA NMR-assisted secondary structure determination. In the second part of this thesis, due to the shortcomings of FALCON-NMR, we propose a novel SDP-based protein structure determination method from NMR data, called SPROS. Most of the existing prominent protein NMR structure determination methods are based on molecular dynamics coupled with a simulated annealing schedule. In these methods, an objective function representing the error between observed and given distance restraints is minimized; these objective functions are highly non-convex and difficult to optimize. Euclidean distance geometry methods based on SDP provide a natural formulation for realizing a three-dimensional structure from a set of given distance constraints. However, the complexity of the SDP solvers increases cubically with the input matrix size, i.e., the number of atoms in the protein, and the number of constraints. In fact, the complexity of SDP solvers is a major obstacle in their applicability to the protein NMR problem. To overcome these limitations, the SPROS method models the protein molecule as a set of intersecting two- and three-dimensional cliques. We adapt and extend a technique called semidefinite facial reduction for the SDP matrix size reduction, which makes the SDP problem size approximately one quarter of the original problem. The reduced problem is solved nearly one hundred times faster and is more robust against numerical problems. Reasonably accurate results were obtained when SPROS was applied to a set of 20 real protein data sets

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

    Operations research: from computational biology to sensor network

    Get PDF
    In this dissertation we discuss the deployment of combinatorial optimization methods for modeling and solve real life problemS, with a particular emphasis to two biological problems arising from a common scenario: the reconstruction of the three-dimensional shape of a biological molecule from Nuclear Magnetic Resonance (NMR) data. The fi rst topic is the 3D assignment pathway problem (APP) for a RNA molecule. We prove that APP is NP-hard, and show a formulation of it based on edge-colored graphs. Taking into account that interactions between consecutive nuclei in the NMR spectrum are diff erent according to the type of residue along the RNA chain, each color in the graph represents a type of interaction. Thus, we can represent the sequence of interactions as the problem of fi nding a longest (hamiltonian) path whose edges follow a given order of colors (i.e., the orderly colored longest path). We introduce three alternative IP formulations of APP obtained with a max flow problem on a directed graph with packing constraints over the partitions, which have been compared among themselves. Since the last two models work on cyclic graphs, for them we proposed an algorithm based on the solution of their relaxation combined with the separation of cycle inequalities in a Branch & Cut scheme. The second topic is the discretizable distance geometry problem (DDGP), which is a formulation on discrete search space of the well-known distance geometry problem (DGP). The DGP consists in seeking the embedding in the space of a undirected graph, given a set of Euclidean distances between certain pairs of vertices. DGP has two important applications: (i) fi nding the three dimensional conformation of a molecule from a subset of interatomic distances, called Molecular Distance Geometry Problem, and (ii) the Sensor Network Localization Problem. We describe a Branch & Prune (BP) algorithm tailored for this problem, and two versions of it solving the DDGP both in protein modeling and in sensor networks localization frameworks. BP is an exact and exhaustive combinatorial algorithm that examines all the valid embeddings of a given weighted graph G=(V,E,d), under the hypothesis of existence of a given order on V. By comparing the two version of BP to well-known algorithms we are able to prove the e fficiency of BP in both contexts, provided that the order imposed on V is maintained

    Eigenvector Synchronization, Graph Rigidity and the Molecule Problem

    Full text link
    The graph realization problem has received a great deal of attention in recent years, due to its importance in applications such as wireless sensor networks and structural biology. In this paper, we extend on previous work and propose the 3D-ASAP algorithm, for the graph realization problem in R3\mathbb{R}^3, given a sparse and noisy set of distance measurements. 3D-ASAP is a divide and conquer, non-incremental and non-iterative algorithm, which integrates local distance information into a global structure determination. Our approach starts with identifying, for every node, a subgraph of its 1-hop neighborhood graph, which can be accurately embedded in its own coordinate system. In the noise-free case, the computed coordinates of the sensors in each patch must agree with their global positioning up to some unknown rigid motion, that is, up to translation, rotation and possibly reflection. In other words, to every patch there corresponds an element of the Euclidean group Euc(3) of rigid transformations in R3\mathbb{R}^3, and the goal is to estimate the group elements that will properly align all the patches in a globally consistent way. Furthermore, 3D-ASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation. In addition, we also propose 3D-SP-ASAP, a faster version of 3D-ASAP, which uses a spectral partitioning algorithm as a preprocessing step for dividing the initial graph into smaller subgraphs. Our extensive numerical simulations show that 3D-ASAP and 3D-SP-ASAP are very robust to high levels of noise in the measured distances and to sparse connectivity in the measurement graph, and compare favorably to similar state-of-the art localization algorithms.Comment: 49 pages, 8 figure

    A distributed SDP-based algorithm for large noisy anchor-free graph realization

    Get PDF

    An algorithm to enumerate all possible protein conformations verifying a set of distance constraints

    Get PDF
    International audienceBackground: The determination of protein structures satisfying distance constraints is an important problem in structural biology. Whereas the most common method currently employed is simulated annealing, there have been other methods previously proposed in the literature. Most of them, however, are designed to find one solution only. Results: In order to explore exhaustively the feasible conformational space, we propose here an interval Branch-and-Prune algorithm (iBP) to solve the Distance Geometry Problem (DGP) associated to protein structure determination. This algorithm is based on a discretization of the problem obtained by recursively constructing a search space having the structure of a tree, and by verifying whether the generated atomic positions are feasible or not by making use of pruning devices. The pruning devices used here are directly related to features of protein conformations. Conclusions: We described the new algorithm iBP to generate protein conformations satisfying distance constraints, that would potentially allows a systematic exploration of the conformational space. The algorithm iBP has been applied on three α-helical peptides