83,078 research outputs found
An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings
A widely used method for determining the similarity of two labeled trees is
to compute a maximum agreement subtree of the two trees. Previous work on this
similarity measure is only concerned with the comparison of labeled trees of
two special kinds, namely, uniformly labeled trees (i.e., trees with all their
nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled
trees with distinct symbols for distinct leaves). This paper presents an
algorithm for comparing trees that are labeled in an arbitrary manner. In
addition to this generality, this algorithm is faster than the previous
algorithms.
Another contribution of this paper is on maximum weight bipartite matchings.
We show how to speed up the best known matching algorithms when the input
graphs are node-unbalanced or weight-unbalanced. Based on these enhancements,
we obtain an efficient algorithm for a new matching problem called the
hierarchical bipartite matching problem, which is at the core of our maximum
agreement subtree algorithm.Comment: To appear in Journal of Algorithm
Cavity Matchings, Label Compressions, and Unrooted Evolutionary Trees
We present an algorithm for computing a maximum agreement subtree of two
unrooted evolutionary trees. It takes O(n^{1.5} log n) time for trees with
unbounded degrees, matching the best known time complexity for the rooted case.
Our algorithm allows the input trees to be mixed trees, i.e., trees that may
contain directed and undirected edges at the same time. Our algorithm adopts a
recursive strategy exploiting a technique called label compression. The
backbone of this technique is an algorithm that computes the maximum weight
matchings over many subgraphs of a bipartite graph as fast as it takes to
compute a single matching
Forest Density Estimation
We study graph estimation and density estimation in high dimensions, using a
family of density estimators based on forest structured undirected graphical
models. For density estimation, we do not assume the true distribution
corresponds to a forest; rather, we form kernel density estimates of the
bivariate and univariate marginals, and apply Kruskal's algorithm to estimate
the optimal forest on held out data. We prove an oracle inequality on the
excess risk of the resulting estimator relative to the risk of the best forest.
For graph estimation, we consider the problem of estimating forests with
restricted tree sizes. We prove that finding a maximum weight spanning forest
with restricted tree size is NP-hard, and develop an approximation algorithm
for this problem. Viewing the tree size as a complexity parameter, we then
select a forest using data splitting, and prove bounds on excess risk and
structure selection consistency of the procedure. Experiments with simulated
data and microarray data indicate that the methods are a practical alternative
to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation
Hierarchical and High-Girth QC LDPC Codes
We present a general approach to designing capacity-approaching high-girth
low-density parity-check (LDPC) codes that are friendly to hardware
implementation. Our methodology starts by defining a new class of
"hierarchical" quasi-cyclic (HQC) LDPC codes that generalizes the structure of
quasi-cyclic (QC) LDPC codes. Whereas the parity check matrices of QC LDPC
codes are composed of circulant sub-matrices, those of HQC LDPC codes are
composed of a hierarchy of circulant sub-matrices that are in turn constructed
from circulant sub-matrices, and so on, through some number of levels. We show
how to map any class of codes defined using a protograph into a family of HQC
LDPC codes. Next, we present a girth-maximizing algorithm that optimizes the
degrees of freedom within the family of codes to yield a high-girth HQC LDPC
code. Finally, we discuss how certain characteristics of a code protograph will
lead to inevitable short cycles, and show that these short cycles can be
eliminated using a "squashing" procedure that results in a high-girth QC LDPC
code, although not a hierarchical one. We illustrate our approach with designed
examples of girth-10 QC LDPC codes obtained from protographs of one-sided
spatially-coupled codes.Comment: Submitted to IEEE Transactions on Information THeor
Spanning Trees with Many Leaves in Graphs without Diamonds and Blossoms
It is known that graphs on n vertices with minimum degree at least 3 have
spanning trees with at least n/4+2 leaves and that this can be improved to
(n+4)/3 for cubic graphs without the diamond K_4-e as a subgraph. We generalize
the second result by proving that every graph with minimum degree at least 3,
without diamonds and certain subgraphs called blossoms, has a spanning tree
with at least (n+4)/3 leaves, and generalize this further by allowing vertices
of lower degree. We show that it is necessary to exclude blossoms in order to
obtain a bound of the form n/3+c.
We use the new bound to obtain a simple FPT algorithm, which decides in
O(m)+O^*(6.75^k) time whether a graph of size m has a spanning tree with at
least k leaves. This improves the best known time complexity for MAX LEAF
SPANNING TREE.Comment: 25 pages, 27 Figure
- …