881 research outputs found
Reconstructing Gene Trees From Fitch's Xenology Relation
Two genes are xenologs in the sense of Fitch if they are separated by at
least one horizontal gene transfer event. Horizonal gene transfer is asymmetric
in the sense that the transferred copy is distinguished from the one that
remains within the ancestral lineage. Hence xenology is more precisely thought
of as a non-symmetric relation: is xenologous to if has been
horizontally transferred at least once since it diverged from the least common
ancestor of and . We show that xenology relations are characterized by a
small set of forbidden induced subgraphs on three vertices. Furthermore, each
xenology relation can be derived from a unique least-resolved edge-labeled
phylogenetic tree. We provide a linear-time algorithm for the recognition of
xenology relations and for the construction of its least-resolved edge-labeled
phylogenetic tree. The fact that being a xenology relation is a heritable graph
property, finally has far-reaching consequences on approximation problems
associated with xenology relations
Phylogenetics from paralogs
Motivation: Sequence-based phylogenetic approaches heavily rely on initial data sets to be composed of orthologous sequences only. Paralogs are treated as a dangerous nuisance that has to be detected and removed. Recent advances in mathematical phylogenetics, however, have indicated that gene duplications can also convey meaningful phylogenetic information provided orthologs and paralogs can be distinguished with a degree of certainty.
Results: We demonstrate that plausible phylogenetic trees can be inferred from paralogy information only. To this end, tree-free estimates of orthology, the complement of paralogy, are first corrected to conform cographs and then translated into equivalent event-labeled gene phylogenies. A certain subset of the triples displayed by these trees translates into constraints on the species trees. While the resolution is very poor for individual gene families, we observe that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees of several groups of eubacteria. The novel method introduced here relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements.
Availability:The ILP formulation is implemented in the Software ParaPhylo using IBM ILOG CPLEX (TM) Optimizer 12.6 and is freely available from http://pacosy.informatik.uni-leipzig.de/paraphyl
The Complexity of Rooted Phylogeny Problems
Several computational problems in phylogenetic reconstruction can be
formulated as restrictions of the following general problem: given a formula in
conjunctive normal form where the literals are rooted triples, is there a
rooted binary tree that satisfies the formula? If the formulas do not contain
disjunctions, the problem becomes the famous rooted triple consistency problem,
which can be solved in polynomial time by an algorithm of Aho, Sagiv,
Szymanski, and Ullman. If the clauses in the formulas are restricted to
disjunctions of negated triples, Ng, Steel, and Wormald showed that the problem
remains NP-complete. We systematically study the computational complexity of
the problem for all such restrictions of the clauses in the input formula. For
certain restricted disjunctions of triples we present an algorithm that has
sub-quadratic running time and is asymptotically as fast as the fastest known
algorithm for the rooted triple consistency problem. We also show that any
restriction of the general rooted phylogeny problem that does not fall into our
tractable class is NP-complete, using known results about the complexity of
Boolean constraint satisfaction problems. Finally, we present a pebble game
argument that shows that the rooted triple consistency problem (and also all
generalizations studied in this paper) cannot be solved by Datalog
- …