13,651 research outputs found
Complexity of Splits Reconstruction for Low-Degree Trees
Given a vertex-weighted tree T, the split of an edge xy in T is min{s_x(xy),
s_y(xy)} where s_u(uv) is the sum of all weights of vertices that are closer to
u than to v in T. Given a set of weighted vertices V and a multiset of splits
S, we consider the problem of constructing a tree on V whose splits correspond
to S. The problem is known to be NP-complete, even when all vertices have unit
weight and the maximum vertex degree of T is required to be no more than 4. We
show that the problem is strongly NP-complete when T is required to be a path,
the problem is NP-complete when all vertices have unit weight and the maximum
degree of T is required to be no more than 3, and it remains NP-complete when
all vertices have unit weight and T is required to be a caterpillar with
unbounded hair length and maximum degree at most 3. We also design polynomial
time algorithms for the variant where T is required to be a path and the number
of distinct vertex weights is constant, and the variant where all vertices have
unit weight and T has a constant number of leaves. The latter algorithm is not
only polynomial when the number of leaves, k, is a constant, but also
fixed-parameter tractable when parameterized by k. Finally, we shortly discuss
the problem when the vertex weights are not given but can be freely chosen by
an algorithm.
The considered problem is related to building libraries of chemical compounds
used for drug design and discovery. In these inverse problems, the goal is to
generate chemical compounds having desired structural properties, as there is a
strong correlation between structural properties, such as the Wiener index,
which is closely connected to the considered problem, and biological activity
Recommended from our members
Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia
Circumstances in which parsimony but not compatibility will be provably misleading
Phylogenetic methods typically rely on an appropriate model of how data
evolved in order to infer an accurate phylogenetic tree. For molecular data,
standard statistical methods have provided an effective strategy for extracting
phylogenetic information from aligned sequence data when each site (character)
is subject to a common process. However, for other types of data (e.g.
morphological data), characters can be too ambiguous, homoplastic or saturated
to develop models that are effective at capturing the underlying process of
change. To address this, we examine the properties of a classic but neglected
method for inferring splits in an underlying tree, namely, maximum
compatibility. By adopting a simple and extreme model in which each character
either fits perfectly on some tree, or is entirely random (but it is not known
which class any character belongs to) we are able to derive exact and explicit
formulae regarding the performance of maximum compatibility. We show that this
method is able to identify a set of non-trivial homoplasy-free characters, when
the number of taxa is large, even when the number of random characters is
large. By contrast, we show that a method that makes more uniform use of all
the data --- maximum parsimony --- can provably estimate trees in which {\em
none} of the original homoplasy-free characters support splits.Comment: 37 pages, 2 figure
Dynamic Ordered Sets with Exponential Search Trees
We introduce exponential search trees as a novel technique for converting
static polynomial space search structures for ordered sets into fully-dynamic
linear space data structures.
This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and
updating a dynamic set of n integer keys in linear space. Here searching an
integer y means finding the maximum key in the set which is smaller than or
equal to y. This problem is equivalent to the standard text book problem of
maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein:
Introduction to Algorithms, 2nd ed., MIT Press, 2001).
The best previous deterministic linear space bound was O(log n/loglog n) due
Fredman and Willard from STOC 1990. No better deterministic search bound was
known using polynomial space.
We also get the following worst-case linear space trade-offs between the
number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log
n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however,
not likely to be optimal.
Our results are generalized to finger searching and string searching,
providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for
applications in subsequent paper
- …