Search CORE

446 research outputs found

Evaluating Datalog via Tree Automata and Cycluits

Author: Amarilli Antoine
Bourhis Pierre
Monet Mikaël
Senellart Pierre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We investigate parameterizations of both database instances and queries that make query evaluation fixed-parameter tractable in combined complexity. We show that clique-frontier-guarded Datalog with stratified negation (CFG-Datalog) enjoys bilinear-time evaluation on structures of bounded treewidth for programs of bounded rule size. Such programs capture in particular conjunctive queries with simplicial decompositions of bounded width, guarded negation fragment queries of bounded CQ-rank, or two-way regular path queries. Our result is shown by translating to alternating two-way automata, whose semantics is defined via cyclic provenance circuits (cycluits) that can be tractably evaluated.Comment: 56 pages, 63 references. Journal version of "Combined Tractability of Query Evaluation via Tree Automata and Cycluits (Extended Version)" at arXiv:1612.04203. Up to the stylesheet, page/environment numbering, and possible minor publisher-induced changes, this is the exact content of the journal paper that will appear in Theory of Computing Systems. Update wrt version 1: latest reviewer feedbac

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Query Containment for Highly Expressive Datalog Fragments

Author: Bourhis Pierre
Krötzsch Markus
Rudolph Sebastian
Publication venue
Publication date: 30/06/2014
Field of study

The containment problem of Datalog queries is well known to be undecidable. There are, however, several Datalog fragments for which containment is known to be decidable, most notably monadic Datalog and several "regular" query languages on graphs. Monadically Defined Queries (MQs) have been introduced recently as a joint generalization of these query languages. In this paper, we study a wide range of Datalog fragments with decidable query containment and determine exact complexity results for this problem. We generalize MQs to (Frontier-)Guarded Queries (GQs), and show that the containment problem is 3ExpTime-complete in either case, even if we allow arbitrary Datalog in the sub-query. If we focus on graph query languages, i.e., fragments of linear Datalog, then this complexity is reduced to 2ExpSpace. We also consider nested queries, which gain further expressivity by using predicates that are defined by inner queries. We show that nesting leads to an exponentially increasing hierarchy for the complexity of query containment, both in the linear and in the general case. Our results settle open problems for (nested) MQs, and they paint a comprehensive picture of the state of the art in Datalog query containment.Comment: 20 page

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Querying Schemas With Access Restrictions

Author: Benedikt Michael
Bourhis Pierre
Ley Clemens
Publication venue
Publication date: 01/01/2012
Field of study

We study verification of systems whose transitions consist of accesses to a Web-based data-source. An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the user is restricted to filling in values for a particular set of fields. We look at verifying properties of a schema describing the possible accesses of such a system. We present a language where one can describe the properties of an access path, and also specify additional restrictions on accesses that are enforced by the schema. Our main property language, AccLTL, is based on a first-order extension of linear-time temporal logic, interpreting access paths as sequences of relational structures. We also present a lower-level automaton model, Aautomata, which AccLTL specifications can compile into. We show that AccLTL and A-automata can express static analysis problems related to "querying with limited access patterns" that have been studied in the database literature in the past, such as whether an access is relevant to answering a query, and whether two queries are equivalent in the accessible data they can return. We prove decidability and complexity results for several restrictions and variants of AccLTL, and explain which properties of paths can be expressed in each restriction.Comment: VLDB201

arXiv.org e-Print Archive

Oxford University Research Archive

Enumeration on Trees under Relabelings

Author: Amarilli Antoine
Bourhis Pierre
Mengel Stefan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 01/01/2018
Field of study

We study how to evaluate MSO queries with free variables on trees, within the framework of enumeration algorithms. Previous work has shown how to enumerate answers with linear-time preprocessing and delay linear in the size of each output, i.e., constant-delay for free first-order variables. We extend this result to support relabelings, a restricted kind of update operations on trees which allows us to change the node labels. Our main result shows that we can enumerate the answers of MSO queries on trees with linear-time preprocessing and delay linear in each answer, while supporting node relabelings in logarithmic time. To prove this, we reuse the circuit-based enumeration structure from our earlier work, and develop techniques to maintain its index under node relabelings. We also show how enumeration under relabelings can be applied to evaluate practical query languages, such as aggregate, group-by, and parameterized queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

HAL-Artois

On Distances Between Words with Parameters

Author: Bourhis Pierre
Boussidan Aaron
Gambette Philippe
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

The edit distance between parameterized words is a generalization of the classical edit distance where it is allowed to map particular letters of the first word, called parameters, to parameters of the second word before computing the distance. This problem has been introduced in particular for detection of code duplication, and the notion of words with parameters has also been used with different semantics in other fields. The complexity of several variants of edit distances between parameterized words has been studied, however, the complexity of the most natural one, the Levenshtein distance, remained open. In this paper, we solve this open question and close the exhaustive analysis of all cases of parameterized word matching and function matching, showing that these problems are np-complete. To this aim, we also provide a comparison of the different problems, exhibiting several equivalences between them. We also provide and implement a MaxSAT encoding of the problem, as well as a simple FPT algorithm in the alphabet size, and study their efficiency on real data in the context of theater play structure comparison

Dagstuhl Research Online Publication Server

A Formal Study of Collaborative Access Control in Distributed Datalog

Author: Abiteboul Serge
Bourhis Pierre
Vianu Victor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Conference on Database Theory (ICDT 2016)
Publication date: 01/01/2016
Field of study

We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for "declassifying" data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

A Circuit-Based Approach to Efficient Enumeration

Author: Amarilli Antoine
Bourhis Pierre
Jachiet Louis
Mengel Stefan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)
Publication date: 01/01/2017
Field of study

We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and delay linear in the Hamming weight of each valuation. Moreover, valuations of constant Hamming weight can be enumerated with linear preprocessing and constant delay. Our results yield a framework for efficient enumeration that applies to all problems whose solutions can be compiled to structured d-DNNFs. In particular, we use it to recapture classical results in database theory, for factorized database representations and for MSO evaluation. This gives an independent proof of constant-delay enumeration for MSO formulae with first-order free variables on bounded-treewidth structures

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

HAL-Artois

Hal-Diderot

Ranked Enumeration of MSO Logic on Words

Author: Bourhis Pierre
Grez Alejandro
Jachiet Louis
Riveros Cristian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Database Theory (ICDT 2021)
Publication date: 01/01/2021
Field of study

In the last years, enumeration algorithms with bounded delay have attracted a lot of attention for several data management tasks. Given a query and the data, the task is to preprocess the data and then enumerate all the answers to the query one by one and without repetitions. This enumeration scheme is typically useful when the solutions are treated on the fly or when we want to stop the enumeration once the pertinent solutions have been found. However, with the current schemes, there is no restriction on the order how the solutions are given and this order usually depends on the techniques used and not on the relevance for the user. In this paper we study the enumeration of monadic second order logic (MSO) over words when the solutions are ranked. We present a framework based on MSO cost functions that allows to express MSO formulae on words with a cost associated with each solution. We then demonstrate the generality of our framework which subsumes, for instance, document spanners and adds ranking to them. The main technical result of the paper is an algorithm for enumerating all the solutions of formulae in increasing order of cost efficiently, namely, with a linear preprocessing phase and logarithmic delay between solutions. The novelty of this algorithm is based on using functional data structures, in particular, by extending functional Brodal queues to suit with the ranked enumeration of MSO on words

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

Ranked Enumeration for MSO on Trees via Knowledge Compilation

Author: Amarilli Antoine
Bourhis Pierre
Capelli Florent
Monet Mikaël
Publication venue
Publication date: 22/01/2024
Field of study

We study the problem of enumerating the satisfying assignments for circuit classes from knowledge compilation, where assignments are ranked in a specific order. In particular, we show how this problem can be used to efficiently perform ranked enumeration of the answers to MSO queries over trees, with the order being given by a ranking function satisfying a subset-monotonicity property. Assuming that the number of variables is constant, we show that we can enumerate the satisfying assignments in ranked order for so-called multivalued circuits that are smooth, decomposable, and in negation normal form (smooth multivalued DNNF). There is no preprocessing and the enumeration delay is linear in the size of the circuit times the number of values, plus a logarithmic term in the number of assignments produced so far. If we further assume that the circuit is deterministic (smooth multivalued d-DNNF), we can achieve linear-time preprocessing in the circuit, and the delay only features the logarithmic term.Comment: 26 pages; this is the authors version of the corresponding ICDT'24 articl

arXiv.org e-Print Archive

Reasoning on Feature Models: Compilation-Based vs. Direct Approaches

Author: Bourhis Pierre
Duchien Laurence
Dusart Jérémie
Lonca Emmanuel
Marquis Pierre
Quinton Clément
Publication venue
Publication date: 14/02/2023
Field of study

Analyzing a Feature Model (FM) and reasoning on the corresponding configuration space is a central task in Software Product Line (SPL) engineering. Problems such as deciding the satisfiability of the FM and eliminating inconsistent parts of the FM have been well resolved by translating the FM into a conjunctive normal form (CNF) formula, and then feeding the CNF to a SAT solver. However, this approach has some limits for other important reasoning issues about the FM, such as counting or enumerating configurations. Two mainstream approaches have been investigated in this direction: (i) direct approaches, using tools based on the CNF representation of the FM at hand, or (ii) compilation-based approaches, where the CNF representation of the FM has first been translated into another representation for which the reasoning queries are easier to address. Our contribution is twofold. First, we evaluate how both approaches compare when dealing with common reasoning operations on FM, namely counting configurations, pointing out one or several configurations, sampling configurations, and finding optimal configurations regarding a utility function. Our experimental results show that the compilation-based is efficient enough to possibly compete with the direct approaches and that the cost of translation (i.e., the compilation time) can be balanced when addressing sufficiently many complex reasoning operations on large configuration spaces. Second, we provide a Java-based automated reasoner that supports these operations for both approaches, thus eliminating the burden of selecting the appropriate tool and approach depending on the operation one wants to perform

arXiv.org e-Print Archive