480 research outputs found
The Complexity of Rooted Phylogeny Problems
Several computational problems in phylogenetic reconstruction can be
formulated as restrictions of the following general problem: given a formula in
conjunctive normal form where the literals are rooted triples, is there a
rooted binary tree that satisfies the formula? If the formulas do not contain
disjunctions, the problem becomes the famous rooted triple consistency problem,
which can be solved in polynomial time by an algorithm of Aho, Sagiv,
Szymanski, and Ullman. If the clauses in the formulas are restricted to
disjunctions of negated triples, Ng, Steel, and Wormald showed that the problem
remains NP-complete. We systematically study the computational complexity of
the problem for all such restrictions of the clauses in the input formula. For
certain restricted disjunctions of triples we present an algorithm that has
sub-quadratic running time and is asymptotically as fast as the fastest known
algorithm for the rooted triple consistency problem. We also show that any
restriction of the general rooted phylogeny problem that does not fall into our
tractable class is NP-complete, using known results about the complexity of
Boolean constraint satisfaction problems. Finally, we present a pebble game
argument that shows that the rooted triple consistency problem (and also all
generalizations studied in this paper) cannot be solved by Datalog
Stratified Negation in Limit Datalog Programs
There has recently been an increasing interest in declarative data analysis,
where analytic tasks are specified using a logical language, and their
implementation and optimisation are delegated to a general-purpose query
engine. Existing declarative languages for data analysis can be formalised as
variants of logic programming equipped with arithmetic function symbols and/or
aggregation, and are typically undecidable. In prior work, the language of
was proposed, which is sufficiently powerful to
capture many analysis tasks and has decidable entailment problem. Rules in this
language, however, do not allow for negation. In this paper, we study an
extension of limit programs with stratified negation-as-failure. We show that
the additional expressive power makes reasoning computationally more demanding,
and provide tight data complexity bounds. We also identify a fragment with
tractable data complexity and sufficient expressivity to capture many relevant
tasks.Comment: 14 pages; full version of a paper accepted at IJCAI-1
The Vadalog System: Datalog-based Reasoning for Knowledge Graphs
Over the past years, there has been a resurgence of Datalog-based systems in
the database community as well as in industry. In this context, it has been
recognized that to handle the complex knowl\-edge-based scenarios encountered
today, such as reasoning over large knowledge graphs, Datalog has to be
extended with features such as existential quantification. Yet, Datalog-based
reasoning in the presence of existential quantification is in general
undecidable. Many efforts have been made to define decidable fragments. Warded
Datalog+/- is a very promising one, as it captures PTIME complexity while
allowing ontological reasoning. Yet so far, no implementation of Warded
Datalog+/- was available. In this paper we present the Vadalog system, a
Datalog-based system for performing complex logic reasoning tasks, such as
those required in advanced knowledge graphs. The Vadalog system is Oxford's
contribution to the VADA research programme, a joint effort of the universities
of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the
main contribution of this paper, we illustrate the first implementation of
Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive
termination control strategy. We also provide a comprehensive experimental
evaluation.Comment: Extended version of VLDB paper
<https://doi.org/10.14778/3213880.3213888
Foundations of Declarative Data Analysis Using Limit Datalog Programs
Motivated by applications in declarative data analysis, we study
---an extension of positive Datalog with
arithmetic functions over integers. This language is known to be undecidable,
so we propose two fragments. In
predicates are axiomatised to keep minimal/maximal numeric values, allowing us
to show that fact entailment is coNExpTime-complete in combined, and
coNP-complete in data complexity. Moreover, an additional
requirement causes the complexity to drop to ExpTime and PTime, respectively.
Finally, we show that stable can express many
useful data analysis tasks, and so our results provide a sound foundation for
the development of advanced information systems.Comment: 23 pages; full version of a paper accepted at IJCAI-17; v2 fixes some
typos and improves the acknowledgment
On Maltsev Digraphs
This is an Open Access article, first published by E-CJ on 25 February 2015.We study digraphs preserved by a Maltsev operation: Maltsev digraphs. We show that these digraphs retract either onto a directed path or to the disjoint union of directed cycles, showing in this way that the constraint satisfaction problem for Maltsev digraphs is in logspace, L. We then generalize results from Kazda (2011) to show that a Maltsev digraph is preserved not only by a majority operation, but by a class of other operations (e.g., minority, Pixley) and obtain a O(|VG|4)-time algorithm to recognize Maltsev digraphs. We also prove analogous results for digraphs preserved by conservative Maltsev operations which we use to establish that the list homomorphism problem for Maltsev digraphs is in L. We then give a polynomial time characterisation of Maltsev digraphs admitting a conservative 2-semilattice operation. Finally, we give a simple inductive construction of directed acyclic digraphs preserved by a Maltsev operation, and relate them with series parallel digraphs.Peer reviewedFinal Published versio
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is
about detecting data duplicate representations for the same external entities,
and merging them into single representations. Relatively recently, declarative
rules called "matching dependencies" (MDs) have been proposed for specifying
similarity conditions under which attribute values in database records are
merged. In this work we show the process and the benefits of integrating four
components of ER: (a) Building a classifier for duplicate/non-duplicate record
pairs built using machine learning (ML) techniques; (b) Use of MDs for
supporting the blocking phase of ML; (c) Record merging on the basis of the
classifier results; and (d) The use of the declarative language "LogiQL" -an
extended form of Datalog supported by the "LogicBlox" platform- for all
activities related to data processing, and the specification and enforcement of
MDs.Comment: Final journal version, with some minor technical corrections.
Extended version of arXiv:1508.0601
- …