1,136 research outputs found

    Graph theoretic methods for the analysis of structural relationships in biological macromolecules

    Get PDF
    Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures

    De novo sequencing of proteins by mass spectrometry

    Get PDF
    Introduction Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. Areas covered De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. Expert opinion As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.publishe

    Phylogenetic Inference and Neanderthal Mitochondrial DNA: Comparison of Parsimony and Distance Models

    Get PDF
    Recently, mtDNA was successfully extracted and sequenced from the Neanderthal type specimen (Krings et al, 1997, 1999). Researches attempted to determine the genetic relationship between the Neanderthal specimen and modem human populations using phylogenetic analysis and concluded that the variation existing between the Neanderthal specimen and the modem lineages falls outside the range of variation of modem human populations. Using molecular mutation rate assumptions, it has been concluded that the Neanderthal line diverged from the line leading to modem humans hundreds of thousands of years previous to earlier estimates. This suggests that Neanderthals went extinct without contributing genes to the lineage of modem humans. There are many techniques that can be used in the phylogenetic analysis of molecular data. There is much discussion over the merits of individual techniques and which techniques are best suited for different analysis. I will examine these debates within the framework of the Krings et al. studies and late hominid evolution. Similar analysis was done on the Neanderthal sequences using distance and parsimony methods. A unique database of contemporary human sequences was used. The goals are to test the validity of the results published by Krings et al., and to gain a clearer understanding of the processes of phylogenetic analysis and a greater appreciation of the significance and impact of its results on the field of hominid evolution

    Phylogenetic Inference and Neanderthal Mitochondrial DNA: Comparison of Parsimony and Distance Models

    Get PDF
    Recently, mtDNA was successfully extracted and sequenced from the Neanderthal type specimen (Krings et al, 1997, 1999). Researches attempted to determine the genetic relationship between the Neanderthal specimen and modem human populations using phylogenetic analysis and concluded that the variation existing between the Neanderthal specimen and the modem lineages falls outside the range of variation of modem human populations. Using molecular mutation rate assumptions, it has been concluded that the Neanderthal line diverged from the line leading to modem humans hundreds of thousands of years previous to earlier estimates. This suggests that Neanderthals went extinct without contributing genes to the lineage of modem humans. There are many techniques that can be used in the phylogenetic analysis of molecular data. There is much discussion over the merits of individual techniques and which techniques are best suited for different analysis. I will examine these debates within the framework of the Krings et al. studies and late hominid evolution. Similar analysis was done on the Neanderthal sequences using distance and parsimony methods. A unique database of contemporary human sequences was used. The goals are to test the validity of the results published by Krings et al., and to gain a clearer understanding of the processes of phylogenetic analysis and a greater appreciation of the significance and impact of its results on the field of hominid evolution

    SAR by MS for Functional Genomics (Structure-Activity Relation by Mass Spectrometry)

    Get PDF
    Large-scale functional genomics will require fast, high-throughput experimental techniques, coupled with sophisticated computer algorithms for data analysis and experiment planning. In this paper, we introduce a combined experimental-computational protocol called Structure-Activity Relation by Mass Spectrometry (SAR by MS), which can be used to elucidate the function of protein-DNA or protein-protein complexes. We present algorithms for SAR by MS and analyze their complexity. Carefully-designed Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI TOF) and Electrospray Ionization (ESI) assays require only femtomolar samples, take only microseconds per spectrum to record, enjoy a resolution of up to one dalton in 10610^6, and (in the case of MALDI) can operate on protein complexes up to a megadalton in mass. Hence, the technique is attractive for high-throughput functional genomics. In SAR by MS, selected residues or nucleosides are 2H-, 13C-, and/or 15N-labeled. Second, the complex is crosslinked. Third, the complex is cleaved with proteases and/or endonucleases. Depending on the binding mode, some cleavage sites will be shielded by the crosslinking. Finally, a mass spectrum of the resulting fragments is obtained and analyzed. The last step is the Data Analysis phase, in which the mass signatures are interpreted to obtain constraints on the functional binding mode. Experiment Planning entails deciding what labeling strategy and cleaving agents to employ, so as to minimize mass degeneracy and spectral overlap, in order that the constraints derived in data analysis yield a small number of binding hypotheses. A number of combinatorial and algorithmic questions arise in deriving algorithms for both Experiment Planning and Data Analysis. We explore the complexity of these problems, obtaining upper and lower bounds. Experimental results are reported from an implementation of our algorithms

    Formal methods applied to the analysis of phylogenies: Phylogenetic model checking

    Get PDF
    Los árboles filogenéticos son abstracciones útiles para modelar y caracterizar la evolución de un conjunto de especies o poblaciones respecto del tiempo. La proposición, verificación y generalización de hipótesis sobre un árbol filogenético inferido juegan un papel importante en el estudio y comprensión de las relaciones evolutivas. Actualmente, uno de los principales objetivos científicos es extraer o descubrir los mensajes biológicos implícitos y las propiedades estructurales subyacentes en la filogenia. Por ejemplo, la integración de información genética en una filogenia ayuda al descubrimiento de genes conservados en todo o parte del árbol, la identificación de posiciones covariantes en el ADN o la estimación de las fechas de divergencia entre especies. Consecuentemente, los árboles ayudan a comprender el mecanismo que gobierna la deriva evolutiva. Hoy en día, el amplio espectro de métodos y herramientas heterogéneas para el análisis de filogenias enturbia y dificulta su utilización, además del fuerte acoplamiento entre la especificación de propiedades y los algoritmos utilizados para su evaluación (principalmente scripts ad hoc). Este problema es el punto de arranque de esta tesis, donde se analiza como solución la posibilidad de introducir un entorno formal de verificación de hipótesis que, de manera automática y modular, estudie la veracidad de dichas propiedades definidas en un lenguaje genérico e independiente (en una lógica formal asociada) sobre uno de los múltiples softwares preparados para ello. La contribución principal de la tesis es la propuesta de un marco formal para la descripción, verificación y manipulación de relaciones causales entre especies de forma independiente del código utilizado para su valoración. Para ello, exploramos las características de las técnicas de model checking, un paradigma en el que una especificación expresada en lógica temporal se verifica con respecto a un modelo del sistema que representa una implementación a un cierto nivel de detalle. Se ha aplicado satisfactoriamente en la industria para el modelado de sistemas y su verificación, emergiendo del ámbito de las ciencias de la computación. Las contribuciones concretas de la tesis han sido: A) La identificación e interpretación de los árboles filogeneticos como modelos de la evolución, adaptados al entorno de las técnicas de model checking. B) La definición de una lógica temporal que captura las propiedades filogenéticas habituales junto con un método de construcción de propiedades. C) La clasificación de propiedades filogenéticas, identificando categorías de propiedades según estén centradas en la estructura del árbol, en las secuencias o sean híbridas. D) La extensión de las lógicas y modelos para contemplar propiedades cuantitativas de tiempo, probabilidad y de distancias. E) El desarrollo de un entorno para la verificación de propiedades booleanas, cuantitativas y paramétricas. F) El establecimiento de los principios para la manipulación simbolica de objetos filogenéticos, p. ej., clados. G) La explotación de las herramientas de model checking existentes, detectando sus problemas y carencias en el campo de filogenia y proponiendo mejoras. H) El desarrollo de técnicas "ad hoc" para obtener ganancia de complejidad alrededor de dos frentes: distribución de los cálculos y datos, y el uso de sistemas de información. Los puntos A-F se centran en las aportaciones conceptuales de nuestra aproximación, mientras que los puntos G-H enfatizan la parte de herramientas e implementación. Los contenidos de la tesis están contrastados por la comunidad científica mediante las siguientes publicaciones en conferencias y revistas internacionales. La introducción de model checking como entorno formal para analizar propiedades biológicas (puntos A-C) ha llevado a la publicación de nuestro primer artículo de congreso [1]. En [2], desarrollamos la verificación de hipótesis filogenéticas sobre un árbol de ejemplo construido a partir de las relaciones impuestas por un conjunto de proteínas codificadas por el ADN mitocondrial humano (ADNmt). En ese ejemplo, usamos una herramienta automática y genérica de model checking (punto G). El artículo de revista [7] resume lo básico de los artículos de congreso previos y extiende la aplicación de lógicas temporales a propiedades filogenéticas no consideradas hasta ahora. Los artículos citados aquí engloban los contenidos presentados en las Parte I--II de la tesis. El enorme tamaño de los árboles y la considerable cantidad de información asociada a los estados (p.ej., la cadena de ADN) obligan a la introducción de adaptaciones especiales en las herramientas de model checking para mantener un rendimiento razonable en la verificación de propiedades y aliviar también el problema de la explosión de estados (puntos G-H). El artículo de congreso [3] presenta las ventajas de rebanar el ADN asociado a los estados, la partición de la filogenia en pequeños subárboles y su distribución entre varias máquinas. Además, la idea original del model checking rebanado se complementa con la inclusión de una base de datos externa para el almacenamiento de secuencias. El artículo de revista [4] reúne las nociones introducidas en [3] junto con la implementación y resultados preliminares presentados [5]. Este tema se corresponde con lo presentado en la Parte III de la tesis. Para terminar, la tesis reaprovecha las extensiones de las lógicas temporales con tiempo explícito y probabilidades a fin de manipular e interrogar al árbol sobre información cuantitativa. El artículo de congreso [6] ejemplifica la necesidad de introducir probabilidades y tiempo discreto para el análisis filogenético de un fenotipo real, en este caso, el ratio de distribución de la intolerancia a la lactosa entre diversas poblaciones arraigadas en las hojas de la filogenia. Esto se corresponde con el Capítulo 13, que queda englobado dentro de las Partes IV--V. Las Partes IV--V completan los conceptos presentados en ese artículo de conferencia hacia otros dominios de aplicación, como la puntuación de árboles, y tiempo continuo (puntos E-F). La introducción de parámetros en las hipótesis filogenéticas se plantea como trabajo futuro. Referencias [1] Roberto Blanco, Gregorio de Miguel Casado, José Ignacio Requeno, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. In Proceedings IEEE International Workshop on Mining and Management of Biological and Health Data, pages 152-157. IEEE, 2010. [2] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Phylogenetic analysis using an SMV tool. In Miguel P. Rocha, Juan M. Corchado Rodríguez, Florentino Fdez-Riverola, and Alfonso Valencia, editors, Proceedings 5th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 93 of Advances in Intelligent and Soft Computing, pages 167-174. Springer, Berlin, 2011. [3] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Sliced model checking for phylogenetic analysis. In Miguel P. Rocha, Nicholas Luscombe, Florentino Fdez-Riverola, and Juan M. Corchado Rodríguez, editors, Proocedings 6th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 154 of Advances in Intelligent and Soft Computing, pages 95-103. Springer, Berlin, 2012. [4] José Ignacio Requeno and José Manuel Colom. Model checking software for phylogenetic trees using distribution and database methods. Journal of Integrative Bioinformatics, 10(3):229-233, 2013. [5] José Ignacio Requeno and José Manuel Colom. Speeding up phylogenetic model checking. In Mohd Saberi Mohamad, Loris Nanni, Miguel P. Rocha, and Florentino Fdez-Riverola, editors, Proceedings 7th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 222 of Advances in Intelligent Systems and Computing, pages 119-126. Springer, Berlin, 2013. [6] José Ignacio Requeno and José Manuel Colom. Timed and probabilistic model checking over phylogenetic trees. In Miguel P. Rocha et al., editors, Proceedings 8th International Conference on Practical Applications of Computational Biology and Bioinformatics, Advances in Intelligent and Soft Computing. Springer, Berlin, 2014. [7] José Ignacio Requeno, Gregorio de Miguel Casado, Roberto Blanco, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(4):1058-1070, 2013

    A supertree approach to shorebird phylogeny

    Get PDF
    BACKGROUND: Order Charadriiformes (shorebirds) is an ideal model group in which to study a wide range of behavioural, ecological and macroevolutionary processes across species. However, comparative studies depend on phylogeny to control for the effects of shared evolutionary history. Although numerous hypotheses have been presented for subsets of the Charadriiformes none to date include all recognised species. Here we use the matrix representation with parsimony method to produce the first fully inclusive supertree of Charadriiformes. We also provide preliminary estimates of ages for all nodes in the tree. RESULTS: Three main lineages are revealed: i) the plovers and allies; ii) the gulls and allies; and iii) the sandpipers and allies. The relative position of these clades is unresolved in the strict consensus tree but a 50% majority-rule consensus tree indicates that the sandpiper clade is sister group to the gulls and allies whilst the plover group is placed at the base of the tree. The overall topology is highly consistent with recent molecular hypotheses of shorebird phylogeny. CONCLUSION: The supertree hypothesis presented herein is (to our knowledge) the only complete phylogenetic hypothesis of all extant shorebirds. Despite concerns over the robustness of supertrees (see Discussion), we believe that it provides a valuable framework for testing numerous evolutionary hypotheses relating to the diversity of behaviour, ecology and life-history of the Charadriiformes
    corecore