573 research outputs found
A New Quartet Tree Heuristic for Hierarchical Clustering
We consider the problem of constructing an an optimal-weight tree from the
3*(n choose 4) weighted quartet topologies on n objects, where optimality means
that the summed weight of the embedded quartet topologiesis optimal (so it can
be the case that the optimal tree embeds all quartets as non-optimal
topologies). We present a heuristic for reconstructing the optimal-weight tree,
and a canonical manner to derive the quartet-topology weights from a given
distance matrix. The method repeatedly transforms a bifurcating tree, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. This contrasts to other heuristic search methods
from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly,
incrementally construct a solution from a random order of objects, and
subsequently add agreement values.Comment: 22 pages, 14 figure
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
Recommended from our members
Metaheuristic approaches for the quartet method of hierarchical clustering
Given a set of objects and their pairwise distances, we wish to determine a visual representation of the data. We use the quartet paradigm to compute a hierarchy of clusters of the objects. The method is based on an NP-hard graph optimization problem called the Minimum Quartet Tree Cost problem. This paper presents and compares several metaheuristic approaches to approximate the optimal hierarchy. The performance of the algorithms is tested through extensive computational experiments and it is shown that the Reduced Variable Neighbourhood Search metaheuristic is the most effective approach to the problem, obtaining high quality solutions in short computational running times
Automatic Music Composition using Answer Set Programming
Music composition used to be a pen and paper activity. These these days music
is often composed with the aid of computer software, even to the point where
the computer compose parts of the score autonomously. The composition of most
styles of music is governed by rules. We show that by approaching the
automation, analysis and verification of composition as a knowledge
representation task and formalising these rules in a suitable logical language,
powerful and expressive intelligent composition tools can be easily built. This
application paper describes the use of answer set programming to construct an
automated system, named ANTON, that can compose melodic, harmonic and rhythmic
music, diagnose errors in human compositions and serve as a computer-aided
composition tool. The combination of harmonic, rhythmic and melodic composition
in a single framework makes ANTON unique in the growing area of algorithmic
composition. With near real-time composition, ANTON reaches the point where it
can not only be used as a component in an interactive composition tool but also
has the potential for live performances and concerts or automatically generated
background music in a variety of applications. With the use of a fully
declarative language and an "off-the-shelf" reasoning engine, ANTON provides
the human composer a tool which is significantly simpler, more compact and more
versatile than other existing systems. This paper has been accepted for
publication in Theory and Practice of Logic Programming (TPLP).Comment: 31 pages, 10 figures. Extended version of our ICLP2008 paper.
Formatted following TPLP guideline
An approximate search engine for structure
As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art.
In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label and the left-to-right order among siblings matters. Unordered labeled trees are trees in which the parent-child relationship is significant, but the order among siblings is unimportant. Free trees (unrooted unordered trees) are acyclic graphs. These trees find many applications in bioinformatics, Web log analysis, phyloinformatics, XML processing, etc.
Two types of similarity measures are investigated: (i) counting the mismatching paths in the query tree and a data tree, and (ii) measuring the topological relationship between the trees. The proposed approaches include storing the paths of trees in a suffix array, employing hashing techniques to speed up retrieval, and counting the number of up-down operations to move a token from one node to another node in a tree. Various filters for accelerating a search, different strategies for parallelizing these search algorithms and applications of these algorithms to XML and phylogenetic data management are discussed.
The proposed techniques have been implemented into a phylogenetic search engine which is fully operational and is available on the World Wide Web. Experimental results on comparing the similarity measures with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate the effectiveness of the search engine. Future work includes extending the techniques to other structural data, as well as developing new filters and algorithms for speeding up searching and mining in complex structures
A new quartet tree heuristic for hierarchical clustering
We present a new quartet tree heuristic for hierarchical clustering from weighted quartet topologies, and a standard manner to derive those from a given distance matrix. We do not assume that there is a true ternary tree that generated the quartet topologies or distances which we wish to recover as closely as possible. Our aim is to just model the input data as faithfully as possible by the quartet tree. Our method is capable of handling up to 60–80 objects in a matter of hours, while no existing quartet heuristic can directly compute a quartet tree of more than about 20–30 objects without running for years. The method is implemented and available as public software
- …