1,529 research outputs found
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
A New Quartet Tree Heuristic for Hierarchical Clustering
We consider the problem of constructing an an optimal-weight tree from the
3*(n choose 4) weighted quartet topologies on n objects, where optimality means
that the summed weight of the embedded quartet topologiesis optimal (so it can
be the case that the optimal tree embeds all quartets as non-optimal
topologies). We present a heuristic for reconstructing the optimal-weight tree,
and a canonical manner to derive the quartet-topology weights from a given
distance matrix. The method repeatedly transforms a bifurcating tree, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. This contrasts to other heuristic search methods
from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly,
incrementally construct a solution from a random order of objects, and
subsequently add agreement values.Comment: 22 pages, 14 figure
Recommended from our members
Metaheuristic approaches for the quartet method of hierarchical clustering
Given a set of objects and their pairwise distances, we wish to determine a visual representation of the data. We use the quartet paradigm to compute a hierarchy of clusters of the objects. The method is based on an NP-hard graph optimization problem called the Minimum Quartet Tree Cost problem. This paper presents and compares several metaheuristic approaches to approximate the optimal hierarchy. The performance of the algorithms is tested through extensive computational experiments and it is shown that the Reduced Variable Neighbourhood Search metaheuristic is the most effective approach to the problem, obtaining high quality solutions in short computational running times
Coevolutionary history of ecological replicates: comparing phylogenies of wing and body lice to Columbiform hosts
Book ChapterPhylogenies depict the history of speciation for groups of organisms. Comparing the phylogenies of interacting groups can reveal instances of tandem speciation, or "cospeciation" (Brooks and McLennan, 1991; Hoberg et al., 1997; Paterson and Gray, 1997). Understanding the conditions under which cospeciation takes place is a challenging task. In the case of hosts and their parasites, cospeciation occurs when isolation of host populations also isolates the parasites on those hosts. Patterns of cospeciation can break down owing to dispersal of parasites among host populations, sympatric speciation of parasites on a single host population, or extinction of parasites on a host population (Page and Charleston, 1998). All else being equal, ecologically similar parasites living on the same host should respond to isolation of host populations in the same way, yielding similar coevolutionary histories. In this chapter we compare cospeciation events in two such "replicate" groups of lice living on the same hosts. If forces promoting speciation, such as host speciation, act on these parasites in similar ways, then we would expect cospeciation events to be correlated between these parasite groups. On the other hand, if the parasites respond to isolation differently, then cospeciation events should be independent in the two groups
Phylogenetic signal and the utility of 12S and 16S mtDNA in frog phylogeny
Genes selected for a phylogenetic study need to contain conserved information that reflects the phylogenetic history at the specific taxonomic level of interest. Mitochondrial ribosomal genes have been used for a wide range of phylogenetic questions in general and in anuran systematics in particular. We checked the plausibility of phylogenetic reconstructions in anurans that were built from commonly used 12S and 16S rRNA gene sequences. For up to 27 species arranged in taxon sets of graded inclusiveness, we inferred phylogenetic hypotheses based on different apriori decisions, i.e. choice of alignment method and alignment parameters, including/excluding variable sites, choice of reconstruction algorithm and models of evolution. Alignment methods and parameters, as well as taxon sampling all had notable effects on the results leading to a large number of conflicting topologies. Very few nodes were supported in all of the analyses. Data sets in which fast evolving and ambiguously aligned sites had been excluded performed worse than the complete data sets. There was moderate support for the monophyly of the Discoglossidae, Pelobatoidea, Pelobatidae and Pipidae. The clade Neobatrachia was robustly supported and the intrageneric relationships within Bombina and Discoglossus were well resolved indicating the usefulness of the genes for relatively recent phylogenetic events. Although 12S and 16S rRNA genes seem to carry some phylogenetic signal of deep (Mesozoic) splitting events the signal was not strong enough to resolve consistently the inter-relationships of major clades within the Anura under varied methods and parameter settings
Bilaterian Phylogeny Based on Analyses of a Region of the Sodium-potassium ATPase beta-subunit Gene
Molecular investigations of deep-level relationships within and among the animal phyla have been hampered by a lack of slowly evolving genes that are amenable to study by molecular systematists. To provide new data for use in deep-level metazoan phylogenetic studies, primers were developed to amplify a 1.3-kb region of the alpha subunit of the nuclear-encoded sodium-potassium ATPase gene from 31 bilaterians representing several phyla. Maximum parsimony, maximum likelihood, and Bayesian analyses of these sequences (combined with ATPase sequences for 23 taxa downloaded from GenBank) yield congruent trees that corroborate recent findings based on analyses of other data sets (e.g., the 18S ribosomal RNA gene). The ATPase-based trees support monophyly for several clades (including Lophotrochozoa, a form of Ecdysozoa, Vertebrata, Mollusca, Bivalvia, Gastropoda, Arachnida, Hexapoda, Coleoptera, and Diptera) but do not support monophyly for Deuterostomia, Arthropoda, or Nemertea. Parametric bootstrapping tests reject monophyly for Arthropoda and Nemertea but are unable to reject deuterostome monophyly. Overall, the sodium-potassium ATPase alpha-subunit gene appears to be useful for deep-level studies of metazoan phylogeny
- …