7 research outputs found

    Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures

    Get PDF
    BACKGROUND: Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far, however, they have been confined to representing 1D sequence (or the aspects of structure that could be represented by character strings). RESULTS: We develop an HMM formalism that explicitly uses 3D coordinates in its match states. The match states are modeled by 3D Gaussian distributions centered on the mean coordinate position of each alpha carbon in a large structural alignment. The transition probabilities depend on the spread of the neighboring match states and on the number of gaps found in the structural alignment. We also develop methods for aligning query structures against 3D HMMs and scoring the result probabilistically. For 1D HMMs these tasks are accomplished by the Viterbi and forward algorithms. However, these will not work in unmodified form for the 3D problem, due to non-local quality of structural alignment, so we develop extensions of these algorithms for the 3D case. Several applications of 3D HMMs for protein structure classification are reported. A good separation of scores for different fold families suggests that the described construct is quite useful for protein structure analysis. CONCLUSION: We have created a rigorous 3D HMM representation for protein structures and implemented a complete set of routines for building 3D HMMs in C and Perl. The code is freely available from , and at this site we also have a simple prototype server to demonstrate the features of the described approach

    Improving model construction of profile HMMs for remote homology detection through structural alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance.</p> <p>Results</p> <p>We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test.</p> <p>Conclusion</p> <p>We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p

    Exploring RNA and protein 3D structures by geometric algorithms

    Get PDF
    Many problems in RNA and protein structures are related with their specific geometric properties. Geometric algorithms can be used to explore the possible solutions of these problems. This dissertation investigates the geometric properties of RNA and protein structures and explores three different ways that geometric algorithms can help to the study of the structures. Determine accurate structures. Accurate details in RNA structures are important for understanding RNA function, but the backbone conformation is difficult to determine and most existing RNA structures show serious steric clashes (greater than or equal to 0.4 A overlap). I developed a program called RNABC (RNA Backbone Correction) that searches for alternative clash-free conformations with acceptable geometry. It rebuilds a suite (unit from sugar to sugar) by anchoring phosphorus and base positions, which are clearest in crystallographic electron density, and reconstructing other atoms using forward kinematics and conjugate gradient methods. Two tests show that RNABC improves backbone conformations for most problem suites in S-motifs and for many of the worst problem suites identified by members of the Richardson lab. Display structure commonalities. Structure alignment commonly uses root mean squared distance (RMSD) to measure the structural similarity. I first extend RMSD to weighted RMSD (wRMSD) for multiple structures and show that using wRMSD with multiplicative weights implies the average is a consensus structure. Although I show that finding the optimal translations and rotations for minimizing wRMSD cannot be decoupled for multiple structures, I develop a near-linear iterative algorithm to converge to a local minimum of wRMSD. Finally I propose a heuristic algorithm to iteratively reassign weights to reduce the effect of outliers and find well-aligned positions that determine structurally conserved regions. Distinguish local structural features. Identifying common motifs (fragments of structures common to a group of molecules) is one way to further our understanding of the structure and function of molecules. I apply a graph database mining technique to identify RNA tertiary motifs. I abstract RNA molecules as labeled graphs, use a frequent subgraph mining algorithm to derive tertiary motifs, and present an iterative structure alignment algorithm to classify tertiary motifs and generate consensus motifs. Tests on ribosomal and transfer RNA families show that this method can identify most known RNA tertiary motifs in these families and suggest candidates for novel tertiary motifs

    Graph based pattern discovery in protein structures

    Get PDF
    The rapidly growing body of 3D protein structure data provides new opportunities to study the relation between protein structure and protein function. Local structure pattern of proteins has been the focus of recent efforts to link structural features found in proteins to protein function. In addition, structure patterns have demonstrated values in applications such as predicting protein-protein interaction, engineering proteins, and designing novel medicines. My thesis introduces graph-based representations of protein structure and new subgraph mining algorithms to identify recurring structure patterns common to a set of proteins. These techniques enable families of proteins exhibiting similar function to be analyzed for structural similarity. Previous approaches to protein local structure pattern discovery operate in a pairwise fashion and have prohibitive computational cost when scaled to families of proteins. The graph mining strategy is robust in the face of errors in the structure, and errors in the set of proteins thought to share a function. Two collaborations with domain experts at the UNC School of Pharmacy and the UNC Medical School demonstrate the utility of these techniques. The first is to predict the function of several newly characterized protein structures. The second is to identify conserved structural features in evolutionarily related proteins

    Modélisation de signaux temporels hautes fréquences multicapteurs à valeurs manquantes : Application à la prédiction des efflorescences phytoplanctoniques dans les rivières et les écosystèmes marins côtiers

    Get PDF
    Because of the growing interest for environmental issues and to identify direct and indirect effects of anthropogenic activities on ecosystems, environmental monitoring programs have recourse more and more frequently to high resolution, autonomous and multi-sensor instrumented stations. These systems are implemented in harsh environment and there is a need to stop measurements for calibration, service purposes or just because of sensors failure. Consequently, data could be noisy, missing or out of range and required some pre-processing or filtering steps to complete and validate raw data before any further investigations. In this context, the objective of this work is to design an automatic numeric system able to manage such amount of data in order to further knowledge on water quality and more precisely with consideration about phytoplankton determinism and dynamics. Main phase is the methodological development of phytoplankton bloom forecasting models giving the opportunity to end-user to handle well-adapted protocols. We propose to use hybrid Hidden Markov Model to detect and forecast environment states (identification of the main phytoplankton bloom steps and associated hydrological conditions). The added-value of our approach is to hybrid our model with a spectral clustering algorithm. Thus all HMM parameters (states, characterisation and dynamics of these states) are built by unsupervised learning. This approach was applied on three data bases: first one from the marine instrumented station MAREL Carnot (Ifremer) (2005-2009), second one from a Ferry Box system implemented in the eastern English Channel en 2012 and third one from a freshwater fixed station in the river Deûle in 2009 (Artois Picardie Water Agency). These works fall within the scope of a collaboration between IFREMER, LISIC/ULCO and Artois Picardie Water Agency in order to develop optimised systems to study effects of anthropogenic activities on aquatic systems functioning in a regional context of massive blooms of the harmful algae, Phaeocystis globosa.La prise de conscience des problèmes d'environnement et des effets directs et indirects des activités humaines a conduit à renforcer la surveillance haute fréquence des écosystèmes marins par l'installation de stations de mesures multicapteurs autonomes. Les capteurs, installés dans des milieux hostiles, sont sujets à des périodes de calibration, d'entretien voire des pannes et sont donc susceptibles de générer des données bruitées, manquantes voire aberrantes qu'il est nécessaire de filtrer et compléter avant toute exploitation ultérieure. Dans ce contexte, l'objectif du travail est de concevoir un système numérique automatisé robuste capable de traiter de tel volume de données afin d’améliorer les connaissances sur la qualité des systèmes aquatiques, et plus particulièrement en considérant le déterminisme et la dynamique des efflorescences du phytoplancton. L'étape cruciale est le développement méthodologique de modèles de prédiction des efflorescences du phytoplancton permettant aux utilisateurs de disposer de protocoles adéquats. Nous proposons pour cela l'emploi du modèle de Markov caché hybridé pour la détection et la prédiction des états de l'environnement (caractérisation des phases clefs de la dynamique et des caractéristiques hydrologiques associées). L'originalité du travail est l'hybridation du modèle de Markov par un algorithme de classification spectrale permettant un apprentissage non supervisé conjoint de la structure, sa caractérisation et la dynamique associée. Cette approche a été appliquée sur trois bases de données réelles : la première issue de la station marine instrumentée MAREL Carnot (Ifremer) (2005-2009), la seconde d’un système de type Ferry Box mis en œuvre en Manche orientale en 2012 et la troisième d’une station de mesures fixe, installée le long de la rivière Deûle en 2009 (Agence de l’Eau Artois Picardie - AEAP). Le travail s’inscrit dans le cadre d’une collaboration étroite entre l'IFREMER, le LISIC/ULCO et l'AEAP afin de développer des systèmes optimisés pour l’étude de l’effet des activités anthropiques sur le fonctionnement des écosystèmes aquatiques et plus particulièrement dans le contexte des efflorescences de l’algue nuisible, Phaeocystis globosa

    Protein Flexibility in Structure-Based Drug Design: Method Development and Novel Mechanisms for Inhibiting HIV-1 Protease.

    Full text link
    Structure-based drug design (SBDD) has emerged as an important tool in drug discovery research. Traditionally, SBDD is based on a static crystal structure of the target protein. However, a protein in solution exists as an ensemble of energetically accessible conformations and is best described when all states are represented. Upon ligand binding, further conformational changes in the receptor can be induced. While ligand flexibility can be accurately reproduced, replicating the innumerable degrees of freedom of the protein is impractical due to limitations in computational power. Previously, Carlson et al. developed a robust method to generate receptor-based pharmacophore models based on an ensemble of protein conformations. The use of multiple protein structures (MPS) allows a range of conformational space that can be assumed by the protein to be sampled and hence, simulates the inherent flexibility of a binding site in a computationally feasible manner. Small molecule probes are used to map energetically favorable regions of each protein active site, and the MPS are then overlaid to identify the most important, chemically relevant features conserved across the conformations. Here, we have refined the MPS method by developing techniques to optimize different steps in the procedure. First, we outline tools to properly overlay flexible proteins based on the rigid regions of the structure by incorporating a Gaussian weight into a standard RMSD alignment. Atoms that barely move between the two conformations will have a greater weighting than those that have a large displacement. Using HIV-1 protease (HIV-1p) as a test case, we next examine the use of various sources of MPS: snapshots of an apo structure across a molecular dynamics simulation, a bound NMR ensemble, and a collection of bound crystal structures. Finally, we implement a simple ranking metric into the MPS method to quantify ligand overlap with a contour-based representation of the pharmacophore model. Overlapping in a region of the active site dense with pharmacophore spheres results in a higher ranking of a ligand pose. The refined MPS method and other computational techniques are then applied to study HIV-1p and investigate a novel inhibition mechanism by modulating its conformational behavior.Ph.D.Medicinal ChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57666/2/kdamm_1.pd
    corecore