473 research outputs found

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Reconstructing phylogenies from nucleotide pattern probabilities : a survey and some new result

    Get PDF
    The variations between homologous nucleotide sequences representative of various species are, in part, a consequence of the evolutionary history of these species. Determining the evolutionary tree from patterns in the sequences depends on inverting the stochastic processes governing the substitutions from their ancestral sequence. We present a nl.J.mber of recent (and some new) results which allow for a tree to be reconstructed from the expected frequencies of patterns in its leaf colorations generated under various Markov models. We summarise recent work using Hadamard conjugation, which provides an analytic relation between the parameters of Kimura's 3ST model on a phylogenetic tree and the sequence patterns produced. We give two applications of the theory by describing new properties of the popular "maximum parsimony" method for tree reconstruction

    Frontiers of Membrane Computing: Open Problems and Research Topics

    Get PDF
    This is a list of open problems and research topics collected after the Twelfth Conference on Membrane Computing, CMC 2012 (Fontainebleau, France (23 - 26 August 2011), meant initially to be a working material for Tenth Brainstorming Week on Membrane Computing, Sevilla, Spain (January 30 - February 3, 2012). The result was circulated in several versions before the brainstorming and then modified according to the discussions held in Sevilla and according to the progresses made during the meeting. In the present form, the list gives an image about key research directions currently active in membrane computing

    Stochastic Models of Molecular Evolution

    Get PDF

    Inferring phylogenetic trees under the general Markov model via a minimum spanning tree backbone

    Get PDF
    Phylogenetic trees are models of the evolutionary relationships among species, with species typically placed at the leaves of trees. We address the following problems regarding the calculation of phylogenetic trees. (1) Leaf-labeled phylogenetic trees may not be appropriate models of evolutionary relationships among rapidly evolving pathogens which may contain ancestor-descendant pairs. (2) The models of gene evolution that are widely used unrealistically assume that the base composition of DNA sequences does not evolve. Regarding problem (1) we present a method for inferring generally labeled phylogenetic trees that allow sampled species to be placed at non-leaf nodes of the tree. Regarding problem (2), we present a structural expectation maximization method (SEM-GM) for inferring leaf-labeled phylogenetic trees under the general Markov model (GM) which is the most complex model of DNA substitution that allows the evolution of base composition. In order to improve the scalability of SEM-GM we present a minimum spanning tree (MST) framework called MST-backbone. MST-backbone scales linearly with the number of leaves. However, the unrealistic location of the root as inferred on empirical data suggests that the GM model may be overtrained. MST-backbone was inspired by the topological relationship between MSTs and phylogenetic trees that was introduced by Choi et al. (2011). We discovered that the topological relationship does not necessarily hold if there is no unique MST. We propose so-called vertex-order based MSTs (VMSTs) that guarantee a topological relationship with phylogenetic trees.Phylogenetische Bäume modellieren evolutionäre Beziehungen zwischen Spezies, wobei die Spezies typischerweise an den Blättern der Bäume sitzen. Wir befassen uns mit den folgenden Problemen bei der Berechnung von phylogenetischen Bäumen. (1) Blattmarkierte phylogenetische Bäume sind möglicherweise keine geeigneten Modelle der evolutionären Beziehungen zwischen sich schnell entwickelnden Krankheitserregern, die Vorfahren-Nachfahren-Paare enthalten können. (2) Die weit verbreiteten Modelle der Genevolution gehen unrealistischerweise davon aus, dass sich die Basenzusammensetzung von DNA-Sequenzen nicht ändert. Bezüglich Problem (1) stellen wir eine Methode zur Ableitung von allgemein markierten phylogenetischen Bäumen vor, die es erlaubt, Spezies, für die Proben vorliegen, an inneren des Baumes zu platzieren. Bezüglich Problem (2) stellen wir eine strukturelle Expectation-Maximization-Methode (SEM-GM) zur Ableitung von blattmarkierten phylogenetischen Bäumen unter dem allgemeinen Markov-Modell (GM) vor, das das komplexeste Modell von DNA-Substitution ist und das die Evolution von Basenzusammensetzung erlaubt. Um die Skalierbarkeit von SEM-GM zu verbessern, stellen wir ein Minimale Spannbaum (MST)-Methode vor, die als MST-Backbone bezeichnet wird. MST-Backbone skaliert linear mit der Anzahl der Blätter. Die Tatsache, dass die Lage der Wurzel aus empirischen Daten nicht immer realistisch abgeleitet warden kann, legt jedoch nahe, dass das GM-Modell möglicherweise übertrainiert ist. MST-backbone wurde von einer topologischen Beziehung zwischen minimalen Spannbäumen und phylogenetischen Bäumen inspiriert, die von Choi et al. 2011 eingeführt wurde. Wir entdeckten, dass die topologische Beziehung nicht unbedingt Bestand hat, wenn es keinen eindeutigen minimalen Spannbaum gibt. Wir schlagen so genannte vertex-order-based MSTs (VMSTs) vor, die eine topologische Beziehung zu phylogenetischen Bäumen garantieren

    Algebraic tools in phylogenomics.

    Get PDF
    En aquesta tesi interdisciplinar desenvolupem eines algebraiques per a problemes en filogenètica i genòmica. Per estudiar l'evolució molecular de les espècies sovint s'usen models evolutius estocàstics. L'evolució es representa en un arbre (anomenat filogenètic) on les espècies actuals corresponen a fulles de l'arbre i els nodes interiors corresponen a ancestres comuns a elles. La longitud d'una branca de l'arbre representa la quantitat de mutacions que han ocorregut entre les dues espècies adjacents a la branca. Llavors l'evolució de seqüències d'ADN en aquestes espècies es modelitza amb un procés Markov ocult al llarg de l'arbre. Si el procés de Markov se suposa a temps continu, normalment s'assumeix que també és homogeni i, en tal cas, els paràmetres del model són les entrades d'una raó de mutació instantània i les longituds de les branques. Si el procés de Markov és a temps discret, llavors els paràmetres del model són les probabilitats condicionades de substitució de nucleòtids al llarg de l'arbre i no hi ha cap hipòtesi d'homogeneïtat. Aquests últims són els tipus de models que considerem en aquesta tesi i són, per tant, més generals que els de temps continu. Des d'aquesta perspectiva s'estudien els problemes més bàsics de la filogenètica: donat un conjunt de seqüències d'ADN, com decidim quin és el model evolutiu més adequat? com inferim de forma eficient els paràmetres del model? I fins i tot, tal i com també hem provat en aquesta tesi, és possible que les espècies no hagin evolucionat seguint un sol arbre sinó una mescla d'arbres i llavors cal abordar aquestes preguntes en aquest cas més general. Per a models evolutius a temps continu i homogenis, s'ha proposat solucions diverses a aquestes preguntes al llarg de les últimes dècades. En aquesta tesi resolem aquests dos problemes per a models evolutius a temps discret usant tècniques algebraiques provinents d'àlgebra lineal, teoria de grups, geometria algebraica i estadística algebraica. A més a més, la nostra solució per al primer problema és vàlida també per a mescles filogenètiques. Hem fet tests dels mètodes proposats en aquesta tesi sobre dades simulades i dades reals del projectes ENCODE (Encyclopedia Of DNA Elements). Per tal de provar els nostres mètodes hem donat algoritmes per a generar seqüències evolucionant sota un model a temps discret amb un nombre esperat de mutacions prefixat. I així mateix, hem demostrat que aquests algorismes generen totes les seqüències possibles (per la majoria de models). Els tests sobre dades simulades mostren que els mètodes proposats són molt acurats i els resultats sobre dades reals permeten corroborar hipòtesis prèviament formulades. Tots els mètodes proposats en aquesta tesi han estat implementats per a un nombre arbitrari d'espècies i estan disponibles públicament.In this thesis we develop interdisciplinary algebraic tools for genomic and phylogenetic problems. To study the molecular evolution of species one often uses stochastic evolutionary models. The evolution is represented in a tree (called phylogenetic tree) whose leaves represent current species and whose internal nodes correspond to their common ancestors. The length of a branch of the tree represents the number of mutations that have occurred between the two species adjacent to the branch. Then ,the evolution of DNA sequences in these species is modeled with a hidden Markov process along the tree. If the Markov process is assumed to be continuous in time, it is usually assumed homogeneous as well and, if so, the model parameters are the instantaneous rate of mutation and the lengths of the branches. If the Markov process is discrete in time, then the model parameters are the conditional probabilities of nucleotide substitution along the tree and there is no assumption of homogeneity. The latter are the types of models we consider in this thesis and are therefore more general than the homogeneous continuous ones. From this perspective we study the basic problems of phylogenetics: Given a set of DNA sequences, what is the evolutionary model that best fits the data? how can we efficiently infer the model parameters? Also, as we also checked in this thesis, it is possible that species have not evolved along a single tree but a mixture of trees so that we need to address these questions in this more general case. For continuous-time, homogeneous, evolutionary models, several solutions to these questions have been proposed during the last decades. In this thesis we solve these two problems for discrete-time evolutionary models, using algebraic techniques from linear algebra, group theory, algebraic geometry and algebraic statistics. In addition, our solution to the first problem is also valid for phylogenetic mixtures. We have made tests of the methods proposed in this thesis on simulated and real data from ENCODE Project (Encyclopedia Of DNA Elements). To test our methods, we also provide algorithms to generate sequences evolving under discrete-time models with a given expected number of mutations. Even more, we have proved that these algorithms generate all possible sequences (for most models). Tests on simulated data show that the methods are very accurate and our results on real data confirm hypotheses previously formulated. All the methods in this thesis have been implemented for an arbitrary number of species and are publicly available.Postprint (published version

    Computing Implicitizations of Multi-Graded Polynomial Maps

    Full text link
    In this paper, we focus on computing the kernel of a map of polynomial rings φ\varphi. This core problem in symbolic computation is known as implicitization. While there are extremely effective Gr\"obner basis methods used to solve this problem, these methods can become infeasible as the number of variables increases. In the case when the map φ\varphi is multigraded, we consider an alternative approach. We demonstrate how to quickly compute a matrix of maximal rank for which φ\varphi has a positive multigrading. Then in each graded component we compute the minimal generators of the kernel in that multidegree with linear algebra. We have implemented our techniques in Macaulay2 and show that our implementation can compute many generators of low degree in examples where Gr\"obner techniques have failed. This includes several examples coming from phylogenetics where even a complete list of quadrics and cubics were unknown. When the multigrading refines total degree, our algorithm is \emph{embarassingly parallel} and a fully parallelized version of our algorithm will be forthcoming in OSCAR.Comment: 16 pages, 2 figures. An implementation of our main algorithm can be found on our MathRepo page as well as our GitHu
    corecore