Search CORE

13,651 research outputs found

Complexity of Splits Reconstruction for Low-Degree Trees

Author: Gaspers Serge
Liedloff Mathieu
Stein Maya
Suchan Karol
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Given a vertex-weighted tree T, the split of an edge xy in T is min{s_x(xy), s_y(xy)} where s_u(uv) is the sum of all weights of vertices that are closer to u than to v in T. Given a set of weighted vertices V and a multiset of splits S, we consider the problem of constructing a tree on V whose splits correspond to S. The problem is known to be NP-complete, even when all vertices have unit weight and the maximum vertex degree of T is required to be no more than 4. We show that the problem is strongly NP-complete when T is required to be a path, the problem is NP-complete when all vertices have unit weight and the maximum degree of T is required to be no more than 3, and it remains NP-complete when all vertices have unit weight and T is required to be a caterpillar with unbounded hair length and maximum degree at most 3. We also design polynomial time algorithms for the variant where T is required to be a path and the number of distinct vertex weights is constant, and the variant where all vertices have unit weight and T has a constant number of leaves. The latter algorithm is not only polynomial when the number of leaves, k, is a constant, but also fixed-parameter tractable when parameterized by k. Finally, we shortly discuss the problem when the vertex weights are not given but can be freely chosen by an algorithm. The considered problem is related to building libraries of chemical compounds used for drug design and discovery. In these inverse problems, the goal is to generate chemical compounds having desired structural properties, as there is a strong correlation between structural properties, such as the Wiener index, which is closely connected to the considered problem, and biological activity

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.

Author: Chan Michelle M
Hussmann Jeffrey A
Jones Matthew G
Khodaverdian Alex
Quinn Jeffrey J
Wang Robert
Weissman Jonathan S
Xu Chenling
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia

eScholarship - University of California

Circumstances in which parsimony but not compatibility will be provably misleading

Author: Scotland Robert W.
Steel Mike
Publication venue
Publication date: 01/01/2015
Field of study

Phylogenetic methods typically rely on an appropriate model of how data evolved in order to infer an accurate phylogenetic tree. For molecular data, standard statistical methods have provided an effective strategy for extracting phylogenetic information from aligned sequence data when each site (character) is subject to a common process. However, for other types of data (e.g. morphological data), characters can be too ambiguous, homoplastic or saturated to develop models that are effective at capturing the underlying process of change. To address this, we examine the properties of a classic but neglected method for inferring splits in an underlying tree, namely, maximum compatibility. By adopting a simple and extreme model in which each character either fits perfectly on some tree, or is entirely random (but it is not known which class any character belongs to) we are able to derive exact and explicit formulae regarding the performance of maximum compatibility. We show that this method is able to identify a set of non-trivial homoplasy-free characters, when the number

n

of taxa is large, even when the number of random characters is large. By contrast, we show that a method that makes more uniform use of all the data --- maximum parsimony --- can provably estimate trees in which {\em none} of the original homoplasy-free characters support splits.Comment: 37 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Dynamic Ordered Sets with Exponential Search Trees

Author: Andersson Arne
Thorup Mikkel
Publication venue
Publication date: 01/01/2002
Field of study

We introduce exponential search trees as a novel technique for converting static polynomial space search structures for ordered sets into fully-dynamic linear space data structures. This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and updating a dynamic set of n integer keys in linear space. Here searching an integer y means finding the maximum key in the set which is smaller than or equal to y. This problem is equivalent to the standard text book problem of maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms, 2nd ed., MIT Press, 2001). The best previous deterministic linear space bound was O(log n/loglog n) due Fredman and Willard from STOC 1990. No better deterministic search bound was known using polynomial space. We also get the following worst-case linear space trade-offs between the number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however, not likely to be optimal. Our results are generalized to finger searching and string searching, providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for applications in subsequent paper

arXiv.org e-Print Archive

CiteSeerX