Search CORE

18,264 research outputs found

Reconstructing (super)trees from data sets with missing distances: Not all is lost

Author: Dicks Jo L.
Huber Katharina T.
Kettleborough George
Roberts Ian N.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/02/2015
Field of study

The wealth of phylogenetic information accumulated over many decades of biological research, coupled with recent technological advances in molecular sequence generation, present significant opportunities for researchers to investigate relationships across and within the kingdoms of life. However, to make best use of this data wealth, several problems must first be overcome. One key problem is finding effective strategies to deal with missing data. Here, we introduce Lasso, a novel heuristic approach for reconstructing rooted phylogenetic trees from distance matrices with missing values, for datasets where a molecular clock may be assumed. Contrary to other phylogenetic methods on partial datasets, Lasso possesses desirable properties such as its reconstructed trees being both unique and edge-weighted. These properties are achieved by Lasso restricting its leaf set to a large subset of all possible taxa, which in many practical situations is the entire taxa set. Furthermore, the Lasso approach is distance-based, rendering it very fast to run and suitable for datasets of all sizes, including large datasets such as those generated by modern Next Generation Sequencing technologies. To better understand the performance of Lasso, we assessed it by means of artificial and real biological datasets, showing its effectiveness in the presence of missing data. Furthermore, by formulating the supermatrix problem as a particular case of the missing data problem, we assessed Lasso's ability to reconstruct supertrees. We demonstrate that, although not specifically designed for such a purpose, Lasso performs better than or comparably with five leading supertree algorithms on a challenging biological data set. Finally, we make freely available a software implementation of Lasso so that researchers may, for the first time, perform both rooted tree and supertree reconstruction with branch lengths on their own partial datasets

University of East Anglia digital repository

Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees

Author: AD Gordon
AV Aho
C Scornavacca
D Bryant
D Lokshtanov
F Delsuc
J Felsenstein
M Frick
M Ng
M Steel
OR Bininda-Emonds
R Diestel
T Kloks
W Maddison
Publication venue
Publication date: 01/01/2016
Field of study

In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species

X

; these relationships are often depicted via a phylogenetic tree -- a tree having its leaves univocally labeled by elements of

X

and without degree-2 nodes -- called the "species tree". One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g. DNA sequences originating from some species in

X

), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The so-obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping -- but not identical -- sets of labels, is called "supertree". In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed-parameter tractable in the number of input trees

k

, by using their expressibility in Monadic Second Order Logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on

k

of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time

2^{O(k^2)} \cdot n

, where

n

is the total size of the input.Comment: 18 pages, 1 figur

arXiv.org e-Print Archive

A Practical Algorithm for Reconstructing Level-1 Phylogenetic Networks

Author: Huber Katharina T.
Kelk Steven
Suchecki Radoslaw
van Iersel Leo
Publication venue
Publication date: 21/10/2009
Field of study

Recently much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks - a type of network slightly more general than a phylogenetic tree - from triplets. Our algorithm has been made publicly available as the program LEV1ATHAN. It combines ideas from several known theoretical algorithms for phylogenetic tree and network reconstruction with two novel subroutines. Namely, an exponential-time exact and a greedy algorithm both of which are of independent theoretical interest. Most importantly, LEV1ATHAN runs in polynomial time and always constructs a level-1 network. If the data is consistent with a phylogenetic tree, then the algorithm constructs such a tree. Moreover, if the input triplet set is dense and, in addition, is fully consistent with some level-1 network, it will find such a network. The potential of LEV1ATHAN is explored by means of an extensive simulation study and a biological data set. One of our conclusions is that LEV1ATHAN is able to construct networks consistent with a high percentage of input triplets, even when these input triplets are affected by a low to moderate level of noise

arXiv.org e-Print Archive

Maastricht University Research Portal

Repository TU/e

Adelaide Research & Scholarship

CWI's Institutional Repository

Pure OAI Repository

University of East Anglia digital repository

Identifying Mislabeled Training Data

Author: Brodley C. E.
Friedl M. A.
Publication venue: 'AI Access Foundation'
Publication date: 01/06/2011
Field of study

This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data

arXiv.org e-Print Archive

Crossref

Near-Optimal Algorithm for Constructing Greedy Consensus Tree

Author: Wu Hongxun
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server