492 research outputs found
A decorated tree approach to random permutations in substitution-closed classes
We establish a novel bijective encoding that represents permutations as
forests of decorated (or enriched) trees. This allows us to prove local
convergence of uniform random permutations from substitution-closed classes
satisfying a criticality constraint. It also enables us to reprove and
strengthen permuton limits for these classes in a new way, that uses a
semi-local version of Aldous' skeleton decomposition for size-constrained
Galton--Watson trees.Comment: New version including referee's corrections, accepted for publication
in Electronic Journal of Probabilit
The Complexity of Rooted Phylogeny Problems
Several computational problems in phylogenetic reconstruction can be
formulated as restrictions of the following general problem: given a formula in
conjunctive normal form where the literals are rooted triples, is there a
rooted binary tree that satisfies the formula? If the formulas do not contain
disjunctions, the problem becomes the famous rooted triple consistency problem,
which can be solved in polynomial time by an algorithm of Aho, Sagiv,
Szymanski, and Ullman. If the clauses in the formulas are restricted to
disjunctions of negated triples, Ng, Steel, and Wormald showed that the problem
remains NP-complete. We systematically study the computational complexity of
the problem for all such restrictions of the clauses in the input formula. For
certain restricted disjunctions of triples we present an algorithm that has
sub-quadratic running time and is asymptotically as fast as the fastest known
algorithm for the rooted triple consistency problem. We also show that any
restriction of the general rooted phylogeny problem that does not fall into our
tractable class is NP-complete, using known results about the complexity of
Boolean constraint satisfaction problems. Finally, we present a pebble game
argument that shows that the rooted triple consistency problem (and also all
generalizations studied in this paper) cannot be solved by Datalog
Solving the intractable problem: optimal performance for worst case scenarios in XML twig pattern matching
In the history of databases, eXtensible Markup Language (XML) has been thought of as the standard format to store and exchange semi-structured data. With the advent of IoT, XML technologies can play an important role in addressing the issue of processing a massive amount of data generated from heterogeneous devices. As the number and complexity of such datasets increases there is a need for algorithms which are able to index and retrieve XML data efficiently even for complex queries. In this context twig pattern matching , finding all occurrences of a twig pattern query (TPQ), is a core operation in XML query processing. Until now holistic joins have been considered the state-of-the-art TPQ processing algorithms, but they fail to guarantee an optimal evaluation except at the expense of excessive storage costs which limit their scope in large datasets. In this article, we introduce a new approach which significantly outperforms earlier methods in terms of both the size of the intermediate storage and query running time. The approach presented here uses Child Prime Labels (Alsubai & North, 2018) to improve the filtering phase of bottom-up twig matching algorithms and a novel algorithm which avoids the use of stacks, thus improving TPQs processing efficiency. Several experiments were conducted on common benchmarks such as DBLP, XMark and TreeBank datasets to study the performance of the new approach. Multiple analyses on a range of twig pattern queries are presented to demonstrate the statistical significance of the improvements
Phylogenetic CSPs are Approximation Resistant
We study the approximability of a broad class of computational problems --
originally motivated in evolutionary biology and phylogenetic reconstruction --
concerning the aggregation of potentially inconsistent (local) information
about items of interest, and we present optimal hardness of approximation
results under the Unique Games Conjecture. The class of problems studied here
can be described as Constraint Satisfaction Problems (CSPs) over infinite
domains, where instead of values or a fixed-size domain, the
variables can be mapped to any of the leaves of a phylogenetic tree. The
topology of the tree then determines whether a given constraint on the
variables is satisfied or not, and the resulting CSPs are called Phylogenetic
CSPs. Prominent examples of Phylogenetic CSPs with a long history and
applications in various disciplines include: Triplet Reconstruction, Quartet
Reconstruction, Subtree Aggregation (Forbidden or Desired). For example, in
Triplet Reconstruction, we are given triplets of the form
(indicating that ``items are more similar to each other than to '')
and we want to construct a hierarchical clustering on the items, that
respects the constraints as much as possible. Despite more than four decades of
research, the basic question of maximizing the number of satisfied constraints
is not well-understood. The current best approximation is achieved by
outputting a random tree (for triplets, this achieves a 1/3 approximation). Our
main result is that every Phylogenetic CSP is approximation resistant, i.e.,
there is no polynomial-time algorithm that does asymptotically better than a
(biased) random assignment. This is a generalization of the results in
Guruswami, Hastad, Manokaran, Raghavendra, and Charikar (2011), who showed that
ordering CSPs are approximation resistant (e.g., Max Acyclic Subgraph,
Betweenness).Comment: 45 pages, 11 figures, Abstract shortened for arxi
- …