    The Complexity of Rooted Phylogeny Problems

    Several computational problems in phylogenetic reconstruction can be formulated as restrictions of the following general problem: given a formula in conjunctive normal form where the literals are rooted triples, is there a rooted binary tree that satisfies the formula? If the formulas do not contain disjunctions, the problem becomes the famous rooted triple consistency problem, which can be solved in polynomial time by an algorithm of Aho, Sagiv, Szymanski, and Ullman. If the clauses in the formulas are restricted to disjunctions of negated triples, Ng, Steel, and Wormald showed that the problem remains NP-complete. We systematically study the computational complexity of the problem for all such restrictions of the clauses in the input formula. For certain restricted disjunctions of triples we present an algorithm that has sub-quadratic running time and is asymptotically as fast as the fastest known algorithm for the rooted triple consistency problem. We also show that any restriction of the general rooted phylogeny problem that does not fall into our tractable class is NP-complete, using known results about the complexity of Boolean constraint satisfaction problems. Finally, we present a pebble game argument that shows that the rooted triple consistency problem (and also all generalizations studied in this paper) cannot be solved by Datalog

    Datalog and Constraint Satisfaction with Infinite Templates

    On finite structures, there is a well-known connection between the expressive power of Datalog, finite variable logics, the existential pebble game, and bounded hypertree duality. We study this connection for infinite structures. This has applications for constraint satisfaction with infinite templates. If the template Gamma is omega-categorical, we present various equivalent characterizations of those Gamma such that the constraint satisfaction problem (CSP) for Gamma can be solved by a Datalog program. We also show that CSP(Gamma) can be solved in polynomial time for arbitrary omega-categorical structures Gamma if the input is restricted to instances of bounded treewidth. Finally, we characterize those omega-categorical templates whose CSP has Datalog width 1, and those whose CSP has strict Datalog width k.Comment: 28 pages. This is an extended long version of a conference paper that appeared at STACS'06. In the third version in the arxiv we have revised the presentation again and added a section that relates our results to formalizations of CSPs using relation algebra

    Linear Datalog and Bounded Path Duality of Relational Structures

    In this paper we systematically investigate the connections between logics with a finite number of variables, structures of bounded pathwidth, and linear Datalog Programs. We prove that, in the context of Constraint Satisfaction Problems, all these concepts correspond to different mathematical embodiments of a unique robust notion that we call bounded path duality. We also study the computational complexity implications of the notion of bounded path duality. We show that every constraint satisfaction problem \csp(\best) with bounded path duality is solvable in NL and that this notion explains in a uniform way all families of CSPs known to be in NL. Finally, we use the results developed in the paper to identify new problems in NL

    Inherent Complexity of Recursive Queries

    AbstractWe give lower bounds on the complexity of certain Datalog queries. Our notion of complexity applies to compile-time optimization techniques for Datalog; thus, our results indicate limitations of these techniques. The main new tool is linear first-order formulas, whose depth (respectively, number of variables) matches the sequential (respectively, parallel) complexity of Datalog programs. We define a combinatorial game (a variant of Ehrenfeucht–Fraı̈ssé games) that can be used to prove nonexpressibility by linear formulas. We thus obtain lower bounds for the sequential and parallel complexity of Datalog queries. We prove syntactically tight versions of our results, by exploiting uniformity and invariance properties of Datalog queries

    Datalog-Expressibility for Monadic and Guarded Second-Order Logic

    We characterise the sentences in Monadic Second-order Logic (MSO) that are over finite structures equivalent to a Datalog program, in terms of an existential pebble game. We also show that for every class C of finite structures that can be expressed in MSO and is closed under homomorphisms, and for all ?,k ?there exists a canonical Datalog program ? of width (?,k), that is, a Datalog program of width (?,k) which is sound for C (i.e., ? only derives the goal predicate on a finite structure ? if ? ? C) and with the property that ? derives the goal predicate whenever some Datalog program of width (?,k) which is sound for C derives the goal predicate. The same characterisations also hold for Guarded Second-order Logic (GSO), which properly extends MSO. To prove our results, we show that every class C in GSO whose complement is closed under homomorphisms is a finite union of constraint satisfaction problems (CSPs) of ?-categorical structures

    Elements of Finite Model Theory [book review]

    On the speed of constraint propagation and the time complexity of arc consistency testing

    Establishing arc consistency on two relational structures is one of the most popular heuristics for the constraint satisfaction problem. We aim at determining the time complexity of arc consistency testing. The input structures GG and HH can be supposed to be connected colored graphs, as the general problem reduces to this particular case. We first observe the upper bound O(e(G)v(H)+v(G)e(H))O(e(G)v(H)+v(G)e(H)), which implies the bound O(e(G)e(H))O(e(G)e(H)) in terms of the number of edges and the bound O((v(G)+v(H))3)O((v(G)+v(H))^3) in terms of the number of vertices. We then show that both bounds are tight up to a constant factor as long as an arc consistency algorithm is based on constraint propagation (like any algorithm currently known). Our argument for the lower bounds is based on examples of slow constraint propagation. We measure the speed of constraint propagation observed on a pair G,HG,H by the size of a proof, in a natural combinatorial proof system, that Spoiler wins the existential 2-pebble game on G,HG,H. The proof size is bounded from below by the game length D(G,H)D(G,H), and a crucial ingredient of our analysis is the existence of G,HG,H with D(G,H)=Ω(v(G)v(H))D(G,H)=\Omega(v(G)v(H)). We find one such example among old benchmark instances for the arc consistency problem and also suggest a new, different construction.Comment: 19 pages, 5 figure

    Complete Axiomatizations of Fragments of Monadic Second-Order Logic on Finite Trees

    We consider a specific class of tree structures that can represent basic structures in linguistics and computer science such as XML documents, parse trees, and treebanks, namely, finite node-labeled sibling-ordered trees. We present axiomatizations of the monadic second-order logic (MSO), monadic transitive closure logic (FO(TC1)) and monadic least fixed-point logic (FO(LFP1)) theories of this class of structures. These logics can express important properties such as reachability. Using model-theoretic techniques, we show by a uniform argument that these axiomatizations are complete, i.e., each formula that is valid on all finite trees is provable using our axioms. As a backdrop to our positive results, on arbitrary structures, the logics that we study are known to be non-recursively axiomatizable

    Enhancing Fixed Point Logic with Cardinality Quantifiers

    Let Q IPP be any quantifier such that FO(QIFP), first-order logic enhanced with Q IPP and its vectorizations, equals inductive fixed point logic, IFP in expressive power. It is known that for certain quantifiers Q, the equivalence FO(QIFP) ≡ IFP is no longer true if Q is added on both sides. Rather, we have FO (QIFP, Q) < IFP(Q) in such cases. We extend these results to a great variety of quantifiers, namely all unbounded simple cardinality quantifiers. Our argument also applies to partial fixed point logic, PFP. In order to establish an analogous result for least fixed point logic, LFP, we exhibit a general method to pass from arbitrary quantifiers to monotone quantifiers. Our proof shows that the three isomorphism problem is not definable in, infinitary logic extended with all monadic quantifiers and their vectorizations, where a finite bound is imposed to the number of variables as well as to the number of nested quantifiers in Q1. This strengthens a result of Etessami and Immerman by which tree isomorphism is not definable in TC + COUNTIN