Search CORE

597 research outputs found

Schema Independent Relational Learning

Author: Abiteboul S.
Anderson M.
Arias M.
Kraska T.
Muggleton S.
Muggleton S.
Muggleton S.
Yin X.
Publication venue
Publication date: 06/11/2017
Field of study

Learning novel concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same data set may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, the output of current relational learning algorithms tends to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency. This variation complicates their off-the-shelf application. In this paper, we introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We study both sample-based learning algorithms, which learn from sets of labeled examples, and query-based algorithms, which learn by asking queries to an oracle. We prove that current relational learning algorithms are generally not schema independent. For query-based learning algorithms we show that the (de) composition transformations influence their query complexity. We propose Castor, a sample-based relational learning algorithm that achieves schema independence by leveraging data dependencies. We support the theoretical results with an empirical study that demonstrates the schema dependence/independence of several algorithms on existing benchmark and real-world datasets under (de) compositions

arXiv.org e-Print Archive

Crossref

The tractability frontier of well-designed SPARQL queries

Author: Abiteboul S.
Flum J.
Kaminski M.
Kolaitis P.
Kostylev E.
Kroll M.
Pichler S. S. R.
Publication venue
Publication date: 27/03/2018
Field of study

We study the complexity of query evaluation of SPARQL queries. We focus on the fundamental fragment of well-designed SPARQL restricted to the AND, OPTIONAL and UNION operators. Our main result is a structural characterisation of the classes of well-designed queries that can be evaluated in polynomial time. In particular, we introduce a new notion of width called domination width, which relies on the well-known notion of treewidth. We show that, under some complexity theoretic assumptions, the classes of well-designed queries that can be evaluated in polynomial time are precisely those of bounded domination width

arXiv.org e-Print Archive

Crossref

Distributed XML Design

Author: Abiteboul S.
Gottlob G.
Manna M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-document T[f1,...,fn] where some leaves are "docking points" for external resources providing XML subtrees (f1,...,fn, standing, e.g., for Web services or peers at remote locations). The top-down design problem consists in, given a type (a schema document that may vary from a DTD to a tree automaton) for the distributed document, "propagating" locally this type into a collection of types, that we call typing, while preserving desirable properties. We also consider the bottom-up design which consists in, given a type for each external resource, exhibiting a global type that is enforced by the local types, again with natural desirable properties. In the article, we lay out the fundamentals of a theory of distributed XML design, analyze problems concerning typing issues in this setting, and study their complexity.Comment: "56 pages, 4 figures

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

INRIA a CCSD electronic archive server

Ontology Alignment at the Instance and Schema Level

Author: Fabian M. Suchanek
Fabian M. Suchanek
Pierre Senellart
Pierre Senellart
Serge Abiteboul
Serge Abiteboul
Équipe-projet Webdam
Publication venue
Publication date: 01/01/2011
Field of study

We present PARIS, an approach for the automatic alignment of ontologies. PARIS aligns not only instances, but also relations and classes. Alignments at the instance-level cross-fertilize with alignments at the schema-level. Thereby, our system provides a truly holistic solution to the problem of ontology alignment. The heart of the approach is probabilistic. This allows PARIS to run without any parameter tuning. We demonstrate the efficiency of the algorithm and its precision through extensive experiments. In particular, we obtain a precision of around 90% in experiments with two of the world's largest ontologies.Comment: Technical Report at INRIA RT-040

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Context-Free Path Queries on RDF Graphs

Author: A Hogan
A Polleres
EV Kostylev
F Alkhateeb
F Alkhateeb
GHL Fletcher
J Hayes
J Hopcroft
J Pérez
J Pérez
K Losemann
KJ Kochut
M Arenas
M Lange
M Marx
P Linz
P Sevon
R Angles
S Abiteboul
S Abiteboul
S Bischof
X Zhang
X Zhang
Publication venue
Publication date: 07/10/2016
Field of study

Navigational graph queries are an important class of queries that canextract implicit binary relations over the nodes of input graphs. Most of the navigational query languages used in the RDF community, e.g. property paths in W3C SPARQL 1.1 and nested regular expressions in nSPARQL, are based on the regular expressions. It is known that regular expressions have limited expressivity; for instance, some natural queries, like same generation-queries, are not expressible with regular expressions. To overcome this limitation, in this paper, we present cfSPARQL, an extension of SPARQL query language equipped with context-free grammars. The cfSPARQL language is strictly more expressive than property paths and nested expressions. The additional expressivity can be used for modelling graph similarities, graph summarization and ontology alignment. Despite the increasing expressivity, we show that cfSPARQL still enjoys a low computational complexity and can be evaluated efficiently.Comment: 25 page

arXiv.org e-Print Archive

Crossref

Beyond Worst-Case Analysis for Joins with Minesweeper

Author: Abiteboul S.
Alon N.
Barbay J.
Brault-Baron J.
Demaine E. D.
Freuder E. C.
M. Y
Ordyniak S.
Veldhuizen T. L.
Publication venue
Publication date: 28/03/2014
Field of study

We describe a new algorithm, Minesweeper, that is able to satisfy stronger runtime guarantees than previous join algorithms (colloquially, `beyond worst-case guarantees') for data in indexed search trees. Our first contribution is developing a framework to measure this stronger notion of complexity, which we call {\it certificate complexity}, that extends notions of Barbay et al. and Demaine et al.; a certificate is a set of propositional formulae that certifies that the output is correct. This notion captures a natural class of join algorithms. In addition, the certificate allows us to define a strictly stronger notion of runtime complexity than traditional worst-case guarantees. Our second contribution is to develop a dichotomy theorem for the certificate-based notion of complexity. Roughly, we show that Minesweeper evaluates

\beta

-acyclic queries in time linear in the certificate plus the output size, while for any

\beta

-cyclic query there is some instance that takes superlinear time in the certificate (and for which the output is no larger than the certificate size). We also extend our certificate-complexity analysis to queries with bounded treewidth and the triangle query.Comment: [This is the full version of our PODS'2014 paper.

arXiv.org e-Print Archive

CiteSeerX

Crossref

Answering Conjunctive Queries under Updates

Author: Abiteboul S.
Berkholz C.
Brault-Baron J.
Chen H.
Moret B. M. E.
Segoufin L.
Veldhuizen T. L.
Williams R.
Zeume T.
Publication venue
Publication date: 21/02/2017
Field of study

We consider the task of enumerating and counting answers to

k

-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear

^\ast

delay and sublinear update time (and arbitrary preprocessing time) is impossible. For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the query's homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time. All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis.

^\ast)

By sublinear we mean

O(n^{1-\varepsilon})

for some

\varepsilon>0

, where

n

is the size of the active domain of the current database

arXiv.org e-Print Archive

Crossref

Reasoning about embedded dependencies using inclusion dependencies

Author: AK Chandra
AV Aho
C Beeri
D Maier
J Väänänen
JC Mitchell
M Hannula
M Yannakakis
N Immerman
P Galliani
P Galliani
S Abiteboul
W Hodges
Publication venue
Publication date: 02/07/2015
Field of study

The implication problem for the class of embedded dependencies is undecidable. However, this does not imply lackness of a proof procedure as exemplified by the chase algorithm. In this paper we present a complete axiomatization of embedded dependencies that is based on the chase and uses inclusion dependencies and implicit existential quantification in the intermediate steps of deductions

arXiv.org e-Print Archive

Crossref

XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

Author: A. Balmin
A. Chebotko
D. Florescu
D. Kossmann
H. Gupta
H. Gupta
H.V. Jagadish
M. Atay
M.R. Garey
R. Chirkova
S. Abiteboul
S. Chaudhuri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree

T

, in which every node

i

has a size

c_i

and profit

p_i

, and the size limitation

C

. The target is to find a subset of subtrees rooted at nodes

i_1,\cdots, i_k

respectively such that

c_{i_1}+\cdots +c_{i_k}\le C

, and

p_{i_1}+\cdots +p_{i_k}

is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

arXiv.org e-Print Archive

Crossref

Bounded Conjunctive Queries

Author: Abiteboul S.
Acharya S.
Armbrust M.
Armbrust M.
Ausiello G.
Babcock B.
Bárány V.
Ester M.
Fan W.
Garofalakis M. N.
Gottlob G.
Ioannidis Y. E.
Jagadish H. V.
Kanellakis P. C.
Kaufman L.
Rösch P.
Publication venue: 'VLDB Endowment'
Publication date: 01/08/2014
Field of study

Crossref

Edinburgh Research Explorer