446 research outputs found
Evaluating Datalog via Tree Automata and Cycluits
We investigate parameterizations of both database instances and queries that
make query evaluation fixed-parameter tractable in combined complexity. We show
that clique-frontier-guarded Datalog with stratified negation (CFG-Datalog)
enjoys bilinear-time evaluation on structures of bounded treewidth for programs
of bounded rule size. Such programs capture in particular conjunctive queries
with simplicial decompositions of bounded width, guarded negation fragment
queries of bounded CQ-rank, or two-way regular path queries. Our result is
shown by translating to alternating two-way automata, whose semantics is
defined via cyclic provenance circuits (cycluits) that can be tractably
evaluated.Comment: 56 pages, 63 references. Journal version of "Combined Tractability of
Query Evaluation via Tree Automata and Cycluits (Extended Version)" at
arXiv:1612.04203. Up to the stylesheet, page/environment numbering, and
possible minor publisher-induced changes, this is the exact content of the
journal paper that will appear in Theory of Computing Systems. Update wrt
version 1: latest reviewer feedbac
Query Containment for Highly Expressive Datalog Fragments
The containment problem of Datalog queries is well known to be undecidable.
There are, however, several Datalog fragments for which containment is known to
be decidable, most notably monadic Datalog and several "regular" query
languages on graphs. Monadically Defined Queries (MQs) have been introduced
recently as a joint generalization of these query languages. In this paper, we
study a wide range of Datalog fragments with decidable query containment and
determine exact complexity results for this problem. We generalize MQs to
(Frontier-)Guarded Queries (GQs), and show that the containment problem is
3ExpTime-complete in either case, even if we allow arbitrary Datalog in the
sub-query. If we focus on graph query languages, i.e., fragments of linear
Datalog, then this complexity is reduced to 2ExpSpace. We also consider nested
queries, which gain further expressivity by using predicates that are defined
by inner queries. We show that nesting leads to an exponentially increasing
hierarchy for the complexity of query containment, both in the linear and in
the general case. Our results settle open problems for (nested) MQs, and they
paint a comprehensive picture of the state of the art in Datalog query
containment.Comment: 20 page
Querying Schemas With Access Restrictions
We study verification of systems whose transitions consist of accesses to a
Web-based data-source. An access is a lookup on a relation within a relational
database, fixing values for a set of positions in the relation. For example, a
transition can represent access to a Web form, where the user is restricted to
filling in values for a particular set of fields. We look at verifying
properties of a schema describing the possible accesses of such a system. We
present a language where one can describe the properties of an access path, and
also specify additional restrictions on accesses that are enforced by the
schema. Our main property language, AccLTL, is based on a first-order extension
of linear-time temporal logic, interpreting access paths as sequences of
relational structures. We also present a lower-level automaton model,
Aautomata, which AccLTL specifications can compile into. We show that AccLTL
and A-automata can express static analysis problems related to "querying with
limited access patterns" that have been studied in the database literature in
the past, such as whether an access is relevant to answering a query, and
whether two queries are equivalent in the accessible data they can return. We
prove decidability and complexity results for several restrictions and variants
of AccLTL, and explain which properties of paths can be expressed in each
restriction.Comment: VLDB201
Enumeration on Trees under Relabelings
We study how to evaluate MSO queries with free variables on trees, within the
framework of enumeration algorithms. Previous work has shown how to enumerate
answers with linear-time preprocessing and delay linear in the size of each
output, i.e., constant-delay for free first-order variables. We extend this
result to support relabelings, a restricted kind of update operations on
trees which allows us to change the node labels. Our main result shows that we
can enumerate the answers of MSO queries on trees with linear-time preprocessing
and delay linear in each answer, while supporting node relabelings in logarithmic time. To
prove this, we reuse the circuit-based enumeration structure from our earlier
work, and develop techniques to maintain its index under node relabelings. We
also show how enumeration under relabelings can be applied to evaluate practical
query languages, such as aggregate, group-by, and parameterized queries
On Distances Between Words with Parameters
The edit distance between parameterized words is a generalization of the classical edit distance where it is allowed to map particular letters of the first word, called parameters, to parameters of the second word before computing the distance. This problem has been introduced in particular for detection of code duplication, and the notion of words with parameters has also been used with different semantics in other fields. The complexity of several variants of edit distances between parameterized words has been studied, however, the complexity of the most natural one, the Levenshtein distance, remained open.
In this paper, we solve this open question and close the exhaustive analysis of all cases of parameterized word matching and function matching, showing that these problems are np-complete. To this aim, we also provide a comparison of the different problems, exhibiting several equivalences between them. We also provide and implement a MaxSAT encoding of the problem, as well as a simple FPT algorithm in the alphabet size, and study their efficiency on real data in the context of theater play structure comparison
A Formal Study of Collaborative Access Control in Distributed Datalog
We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for "declassifying" data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes
A Circuit-Based Approach to Efficient Enumeration
We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and delay linear in the Hamming weight of each valuation. Moreover, valuations of constant Hamming weight can be enumerated with linear preprocessing and constant delay.
Our results yield a framework for efficient enumeration that applies to all problems whose solutions can be compiled to structured d-DNNFs. In particular, we use it to recapture classical results in database theory, for factorized database representations and for MSO evaluation. This gives an independent proof of constant-delay enumeration for MSO formulae with first-order free variables on bounded-treewidth structures
Ranked Enumeration of MSO Logic on Words
In the last years, enumeration algorithms with bounded delay have attracted a lot of attention for several data management tasks. Given a query and the data, the task is to preprocess the data and then enumerate all the answers to the query one by one and without repetitions. This enumeration scheme is typically useful when the solutions are treated on the fly or when we want to stop the enumeration once the pertinent solutions have been found. However, with the current schemes, there is no restriction on the order how the solutions are given and this order usually depends on the techniques used and not on the relevance for the user.
In this paper we study the enumeration of monadic second order logic (MSO) over words when the solutions are ranked. We present a framework based on MSO cost functions that allows to express MSO formulae on words with a cost associated with each solution. We then demonstrate the generality of our framework which subsumes, for instance, document spanners and adds ranking to them. The main technical result of the paper is an algorithm for enumerating all the solutions of formulae in increasing order of cost efficiently, namely, with a linear preprocessing phase and logarithmic delay between solutions. The novelty of this algorithm is based on using functional data structures, in particular, by extending functional Brodal queues to suit with the ranked enumeration of MSO on words
Ranked Enumeration for MSO on Trees via Knowledge Compilation
We study the problem of enumerating the satisfying assignments for circuit
classes from knowledge compilation, where assignments are ranked in a specific
order. In particular, we show how this problem can be used to efficiently
perform ranked enumeration of the answers to MSO queries over trees, with the
order being given by a ranking function satisfying a subset-monotonicity
property.
Assuming that the number of variables is constant, we show that we can
enumerate the satisfying assignments in ranked order for so-called multivalued
circuits that are smooth, decomposable, and in negation normal form (smooth
multivalued DNNF). There is no preprocessing and the enumeration delay is
linear in the size of the circuit times the number of values, plus a
logarithmic term in the number of assignments produced so far. If we further
assume that the circuit is deterministic (smooth multivalued d-DNNF), we can
achieve linear-time preprocessing in the circuit, and the delay only features
the logarithmic term.Comment: 26 pages; this is the authors version of the corresponding ICDT'24
articl
Reasoning on Feature Models: Compilation-Based vs. Direct Approaches
Analyzing a Feature Model (FM) and reasoning on the corresponding
configuration space is a central task in Software Product Line (SPL)
engineering. Problems such as deciding the satisfiability of the FM and
eliminating inconsistent parts of the FM have been well resolved by translating
the FM into a conjunctive normal form (CNF) formula, and then feeding the CNF
to a SAT solver. However, this approach has some limits for other important
reasoning issues about the FM, such as counting or enumerating configurations.
Two mainstream approaches have been investigated in this direction: (i) direct
approaches, using tools based on the CNF representation of the FM at hand, or
(ii) compilation-based approaches, where the CNF representation of the FM has
first been translated into another representation for which the reasoning
queries are easier to address. Our contribution is twofold. First, we evaluate
how both approaches compare when dealing with common reasoning operations on
FM, namely counting configurations, pointing out one or several configurations,
sampling configurations, and finding optimal configurations regarding a utility
function. Our experimental results show that the compilation-based is efficient
enough to possibly compete with the direct approaches and that the cost of
translation (i.e., the compilation time) can be balanced when addressing
sufficiently many complex reasoning operations on large configuration spaces.
Second, we provide a Java-based automated reasoner that supports these
operations for both approaches, thus eliminating the burden of selecting the
appropriate tool and approach depending on the operation one wants to perform
- …