45,525 research outputs found
A New Quartet Tree Heuristic for Hierarchical Clustering
We consider the problem of constructing an an optimal-weight tree from the
3*(n choose 4) weighted quartet topologies on n objects, where optimality means
that the summed weight of the embedded quartet topologiesis optimal (so it can
be the case that the optimal tree embeds all quartets as non-optimal
topologies). We present a heuristic for reconstructing the optimal-weight tree,
and a canonical manner to derive the quartet-topology weights from a given
distance matrix. The method repeatedly transforms a bifurcating tree, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. This contrasts to other heuristic search methods
from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly,
incrementally construct a solution from a random order of objects, and
subsequently add agreement values.Comment: 22 pages, 14 figure
On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa
Compatibility of phylogenetic trees is the most important concept underlying
widely-used methods for assessing the agreement of different phylogenetic trees
with overlapping taxa and combining them into common supertrees to reveal the
tree of life. The notion of ancestral compatibility of phylogenetic trees with
nested taxa was introduced by Semple et al in 2004. In this paper we analyze in
detail the meaning of this compatibility from the points of view of the local
structure of the trees, of the existence of embeddings into a common supertree,
and of the joint properties of their cluster representations. Our analysis
leads to a very simple polynomial-time algorithm for testing this
compatibility, which we have implemented and is freely available for download
from the BioPerl collection of Perl modules for computational biology.Comment: Submitte
Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots
Although taxonomy is often used informally to evaluate the results of
phylogenetic inference and find the root of phylogenetic trees, algorithmic
methods to do so are lacking. In this paper we formalize these procedures and
develop algorithms to solve the relevant problems. In particular, we introduce
a new algorithm that solves a "subcoloring" problem for expressing the
difference between the taxonomy and phylogeny at a given rank. This algorithm
improves upon the current best algorithm in terms of asymptotic complexity for
the parameter regime of interest; we also describe a branch-and-bound algorithm
that saves orders of magnitude in computation on real data sets. We also
develop a formalism and an algorithm for rooting phylogenetic trees according
to a taxonomy. All of these algorithms are implemented in freely-available
software.Comment: Version submitted to Algorithms for Molecular Biology. A number of
fixes from previous versio
Tree Contractions and Evolutionary Trees
An evolutionary tree is a rooted tree where each internal vertex has at least
two children and where the leaves are labeled with distinct symbols
representing species. Evolutionary trees are useful for modeling the
evolutionary history of species. An agreement subtree of two evolutionary trees
is an evolutionary tree which is also a topological subtree of the two given
trees. We give an algorithm to determine the largest possible number of leaves
in any agreement subtree of two trees T_1 and T_2 with n leaves each. If the
maximum degree d of these trees is bounded by a constant, the time complexity
is O(n log^2(n)) and is within a log(n) factor of optimal. For general d, this
algorithm runs in O(n d^2 log(d) log^2(n)) time or alternatively in O(n d
sqrt(d) log^3(n)) time
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
An Interpretable Deep Hierarchical Semantic Convolutional Neural Network for Lung Nodule Malignancy Classification
While deep learning methods are increasingly being applied to tasks such as
computer-aided diagnosis, these models are difficult to interpret, do not
incorporate prior domain knowledge, and are often considered as a "black-box."
The lack of model interpretability hinders them from being fully understood by
target users such as radiologists. In this paper, we present a novel
interpretable deep hierarchical semantic convolutional neural network (HSCNN)
to predict whether a given pulmonary nodule observed on a computed tomography
(CT) scan is malignant. Our network provides two levels of output: 1) low-level
radiologist semantic features, and 2) a high-level malignancy prediction score.
The low-level semantic outputs quantify the diagnostic features used by
radiologists and serve to explain how the model interprets the images in an
expert-driven manner. The information from these low-level tasks, along with
the representations learned by the convolutional layers, are then combined and
used to infer the high-level task of predicting nodule malignancy. This unified
architecture is trained by optimizing a global loss function including both
low- and high-level tasks, thereby learning all the parameters within a joint
framework. Our experimental results using the Lung Image Database Consortium
(LIDC) show that the proposed method not only produces interpretable lung
cancer predictions but also achieves significantly better results compared to
common 3D CNN approaches
- …