2,584 research outputs found
Maximum agreement and compatible supertrees
AbstractGiven a set of leaf-labelled trees with identical leaf sets, the MAST problem, respectively MCT problem, consists of finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, respectively compatible. In this paper, we propose extensions of these problems to the context of supertree inference, where input trees have non-identical leaf sets. This situation is of particular interest in phylogenetics. The resulting problems are called SMAST and SMCT.A sufficient condition is given that identifies cases where these problems can be solved by resorting to MAST and MCT as subproblems. This condition is met, for instance, when only two input trees are considered. Then we give algorithms for SMAST and SMCT that benefit from the link with the subtree problems. These algorithms run in time linear to the time needed to solve MAST, respectively MCT, on an instance of the same or smaller size.It is shown that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.SMAST is shown to be W[2]-hard when the considered parameter is the number of input leaves that have to be removed to obtain the agreement of the input trees. A similar result holds for SMCT. Moreover, the corresponding optimization problems, that is the complements of SMAST and SMCT, cannot be approximated in polynomial time within any constant factor, unless P=NP. These results also hold when the input trees have a bounded number of leaves.The presented results apply to both collections of rooted and unrooted trees
Recommended from our members
COMPACT REPRESENTATIONS OF UNCERTAINTY IN CLUSTERING
Flat clustering and hierarchical clustering are two fundamental tasks, often used to discover meaningful structures in data, such as subtypes of cancer, phylogenetic relationships, taxonomies of concepts, and cascades of particle decays in particle physics. When multiple clusterings of the data are possible, it is useful to represent uncertainty in clustering through various probabilistic quantities, such as the distribution over partitions or tree structures, and the marginal probabilities of subpartitions or subtrees.
Many compact representations exist for structured prediction problems, enabling the efficient computation of probability distributions, e.g., a trellis structure and corresponding Forward-Backward algorithm for Markov models that model sequences. However, no such representation has been proposed for either flat or hierarchical clustering models. In this thesis, we present our work developing data structures and algorithms for computing probability distributions over flat and hierarchical clusterings, as well as for finding maximum a posteriori (MAP) flat and hierarchical clusterings, and various marginal probabilities, as given by a wide range of energy-based clustering models.
First, we describe a trellis structure that compactly represents distributions over flat or hierarchical clusterings. We also describe related data structures that represent approximate distributions. We then present algorithms that, using these structures, allow us to compute the partition function, MAP clustering, and the marginal proba- bilities of a cluster (and sub-hierarchy, in the case of hierarchical clustering) exactly. We also show how these and related algorithms can be used to approximate these values, and analyze the time and space complexity of our proposed methods. We demonstrate the utility of our approaches using various synthetic data of interest as well as in two real world applications, namely particle physics at the Large Hadron Collider at CERN and in cancer genomics. We conclude with a brief discussion of future work
Ferromagnetic Potts Model: Refined #BIS-hardness and Related Results
Recent results establish for 2-spin antiferromagnetic systems that the
computational complexity of approximating the partition function on graphs of
maximum degree D undergoes a phase transition that coincides with the
uniqueness phase transition on the infinite D-regular tree. For the
ferromagnetic Potts model we investigate whether analogous hardness results
hold. Goldberg and Jerrum showed that approximating the partition function of
the ferromagnetic Potts model is at least as hard as approximating the number
of independent sets in bipartite graphs (#BIS-hardness). We improve this
hardness result by establishing it for bipartite graphs of maximum degree D. We
first present a detailed picture for the phase diagram for the infinite
D-regular tree, giving a refined picture of its first-order phase transition
and establishing the critical temperature for the coexistence of the disordered
and ordered phases. We then prove for all temperatures below this critical
temperature that it is #BIS-hard to approximate the partition function on
bipartite graphs of maximum degree D. As a corollary, it is #BIS-hard to
approximate the number of k-colorings on bipartite graphs of maximum degree D
when k <= D/(2 ln D).
The #BIS-hardness result for the ferromagnetic Potts model uses random
bipartite regular graphs as a gadget in the reduction. The analysis of these
random graphs relies on recent connections between the maxima of the
expectation of their partition function, attractive fixpoints of the associated
tree recursions, and induced matrix norms. We extend these connections to
random regular graphs for all ferromagnetic models and establish the Bethe
prediction for every ferromagnetic spin system on random regular graphs. We
also prove for the ferromagnetic Potts model that the Swendsen-Wang algorithm
is torpidly mixing on random D-regular graphs at the critical temperature for
large q.Comment: To appear in SIAM J. Computin
- …