15 research outputs found

    Correcting Gene Trees by Leaf Insertions: Complexity and Approximation

    Get PDF
    Abstract Gene tree correction has recently gained interest in phylogenomics, as it gives insights in understanding the evolution of gene families. Following some recent approaches based on leaf edit operations, we consider a variant of the problem where a gene tree is corrected by inserting leaves with labels in a multiset M. We show that the problem of deciding whether a gene tree can be corrected by inserting leaves with labels in M is NP-complete. Then, we consider an optimization variant of the problem that asks for the correction of a gene tree with leaves labeled by a multiset M ′ , with M ′ ⊇ M , having minimum size. For this optimization variant of the problem, we present a factor 2 approximation algorithm

    Improving the Approximation Ratio of the Maximum Agreement Forest (MAF) on k trees and Estimating the Approximation Ratio of the Acyclic-MAF on k trees

    Get PDF
    Molecular phylogenetics has long been a well-established field of scientific research where the structure of the phylogenetic tree has been analysed to know about the evolutionary process of the organism. In biology, leaf-labelled trees are widely used to describe the evolutionary relationships. In this setting, the leaves of the tree correspond to extant species, and the internal vertices represent the ancestral species. However, for certain species, evolution is not completely tree-like. Reticulation events such as horizontal gene transfer (HGT), hybridization and recombination play a significant role in the evolution of the species. Suppose we have two phylogenetic trees each of which is for a gene of the same set of species. Due to reticulate evolution the two gene trees, though related, appear different. As a result, instead of the tree like structure, a phylogenetic network is widely viewed as a most suitable tool to represent reticulation. A phylogenetic network contains hybrid nodes for the species evolved from two parents. The distance between two phylogenetic trees can be computed with the help of a Maximum Agreement Forest (MAF) of those trees. The fewer components in MAF, the greater is the similarity between the two trees. This number of components in that agreement forest shows how many edges from each of the two trees need to be cut so that the resulting forest agree after all forced edge contractions. Recent research reveals that the MAF on k trees can be approximated within a ratio of 8. We have given a better approximation ratio for the MAF on k trees and also provide an approximation ratio for Maximum Acyclic Agreement Forest (MAAF) on k (\u3e=2) trees

    Cophylogenetic analysis of dated trees

    Get PDF
    Parasites and the associations they form with their hosts is an important area of research due to the associated health risks which parasites pose to the human population. The associations parasites form with their hosts are responsible for a number of the worst emerging diseases impacting global health today, including Ebola, HIV, and malaria. Macro-scale coevolutionary research aims to analyse these associations to provide further insights into these deadly diseases. This approach, first considered by Fahrenholz in 1913, has been applied to hundreds of coevolutionary systems and remains the most robust means to infer the underlying relationships which form between coevolving species. While reconciling the coevolutionary relationships between a pair of evolutionary systems is NP-Hard, it has been shown that if dating information exists there is a polynomial solution. These solutions however are computationally expensive, and are quickly becoming infeasible due to the rapid growth of phylogenetic data. If the rate of growth continues in line with the last three decades, the current means for analysing dated systems will become computationally infeasible. Within this thesis a collection of algorithms are introduced which aim to address this problem. This includes the introduction of the most efficient solution for analysing dated coevolutionary systems optimally, along with two linear time heuristics which may be applied where traditional algorithms are no longer feasible, while still offering a high degree of accuracy 91%. Finally, this work integrates these incremental results into a single model which is able to handle widespread parasitism, the case where parasites infect multiple hosts. This proposed model reconciles two competing theories of widespread parasitism, while also providing an accuracy improvement of 21%, one of the largest single improvements provided in this field to date. As such, the set of algorithms introduced within this thesis offers another step toward a unified coevolutionary analysis framework, consistent with Fahrenholz original coevolutionary analysis model

    Algorithmes de construction et correction d'arbres de gènes par la réconciliation

    Get PDF
    Les gènes, qui servent à encoder les fonctions biologiques des êtres vivants, forment l'unité moléculaire de base de l'hérédité. Afin d'expliquer la diversité des espèces que l'on peut observer aujourd'hui, il est essentiel de comprendre comment les gènes évoluent. Pour ce faire, on doit recréer le passé en inférant leur phylogénie, c'est-à-dire un arbre de gènes qui représente les liens de parenté des régions codantes des vivants. Les méthodes classiques d'inférence phylogénétique ont été élaborées principalement pour construire des arbres d'espèces et ne se basent que sur les séquences d'ADN. Les gènes sont toutefois riches en information, et on commence à peine à voir apparaître des méthodes de reconstruction qui utilisent leurs propriétés spécifiques. Notamment, l'histoire d'une famille de gènes en terme de duplications et de pertes, obtenue par la réconciliation d'un arbre de gènes avec un arbre d'espèces, peut nous permettre de détecter des faiblesses au sein d'un arbre et de l'améliorer. Dans cette thèse, la réconciliation est appliquée à la construction et la correction d'arbres de gènes sous trois angles différents: 1) Nous abordons la problématique de résoudre un arbre de gènes non-binaire. En particulier, nous présentons un algorithme en temps linéaire qui résout une polytomie en se basant sur la réconciliation. 2) Nous proposons une nouvelle approche de correction d'arbres de gènes par les relations d'orthologie et paralogie. Des algorithmes en temps polynomial sont présentés pour les problèmes suivants: corriger un arbre de gènes afin qu'il contienne un ensemble d'orthologues donné, et valider un ensemble de relations partielles d'orthologie et paralogie. 3) Nous montrons comment la réconciliation peut servir à "combiner'' plusieurs arbres de gènes. Plus précisément, nous étudions le problème de choisir un superarbre de gènes selon son coût de réconciliation.Genes encode the biological functions of all living organisms and are the basic molecular units of heredity. In order to explain the diversity of species that can be observed today, it is essential to understand how genes evolve. To do this, the past has to be recreated by inferring their phylogeny, i.e. a gene tree depicting the parental relationships between the coding regions of living beings. Traditional phylogenetic inference methods have been developed primarily to construct species trees and are solely based on DNA sequences. Genes, however, are rich in information and only a few known reconstruction methods make usage of their specific properties. In particular, the history of a gene family in terms of duplications and losses, obtained by the reconciliation of a gene tree with a tree species, may allow us to detect weaknesses in a tree and improve it. In this thesis, reconciliation is applied to the construction and correction of gene trees from three different angles: 1) We address the problem of resolving a non-binary gene tree. In particular, we present a linear time algorithm that solves a polytomy based on reconciliation. 2) We propose a new gene tree correction approach based on orthology and paralogy relations. Polynomial-time algorithms are presented for the following problems: modify a gene tree so that it contains a given set of orthologous genes, and validate a set of partial orthology and paralogy relations. 3) We show how reconciliation can be used to "combine'' multiple gene trees. Specifically, we study the problem of choosing a gene supertree based on its reconciliation cost

    Placing problems from phylogenetics and (quantified) propositional logic in the polynomial hierarchy

    Get PDF
    In this thesis, we consider the complexity of decision problems from two different areas of research and place them in the polynomial hierarchy: phylogenetics and (quantified) propositional logic. In phylogenetics, researchers study the evolutionary relationships between species. The evolution of a particular gene can often be represented by a single phylogenetic tree. However, in order to model non-tree-like events on a species level such as hybridization and lateral gene transfer, phylogenetic networks are used. They can be considered as a structure that embeds a whole set of phylogenetic trees which is called the display set of the network. There are many interesting questions revolving around display sets and one is often interested in the computational complexity of the considered problems for particular classes of networks. In this thesis, we present our results for different questions related to the display sets of two networks and place the corresponding decision problems in the polynomial hierarchy. Another interesting question concerns the reconstruction of networks: given a set T of phylogenetic trees, can we construct a phylogenetic network with certain properties that embeds all trees in T? For a class of networks that satisfies certain temporal properties, Humphries et al. (2013) established a characterization for when this is possible based on the existence of a particular structure for T, a so-called cherry-picking sequence. We obtain several complexity results for the existence of such a sequence: Deciding the existence of a cherry-picking sequence turns out to be NP-complete for each non-trivial number (i.e., at least two) of given trees. Thereby, we settle the open question stated by Humphries et al. (2013) on the complexity for the case |T| = 2. On the positive side, we identify a special case that we place in the complexity class P by exploring connections to automata theory. Regarding propositional logic, we present our complexity results for the classical satisfiability problem (and variants resp. quantified generalizations thereof) and place the considered variants in the polynomial hierarchy. A common theme is to consider bounded variable appearances in combination with other restrictions such as monotonicity of the clauses or planarity of the incidence graph. This research was inspired by the conjecture that Monotone 3-SAT remains NP-complete if each variable appears at most five times which was stated in the scribe notes of a lecture held by Erik Demaine; we confirm this conjecture in an even more restricted setting where each variable appears exactly four times

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF
    corecore