58 research outputs found

    Constant Delay Enumeration with FPT-Preprocessing for Conjunctive Queries of Bounded Submodular Width

    Get PDF
    Marx (STOC 2010, J. ACM 2013) introduced the notion of submodular width of a conjunctive query (CQ) and showed that for any class Phi of Boolean CQs of bounded submodular width, the model-checking problem for Phi on the class of all finite structures is fixed-parameter tractable (FPT). Note that for non-Boolean queries, the size of the query result may be far too large to be computed entirely within FPT time. We investigate the free-connex variant of submodular width and generalise Marx\u27s result to non-Boolean queries as follows: For every class Phi of CQs of bounded free-connex submodular width, within FPT-preprocessing time we can build a data structure that allows to enumerate, without repetition and with constant delay, all tuples of the query result. Our proof builds upon Marx\u27s splitting routine to decompose the query result into a union of results; but we have to tackle the additional technical difficulty to ensure that these can be enumerated efficiently

    An Analytical Study of Large SPARQL Query Logs

    Full text link
    With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end- users has become more and more common in SPARQL end- points. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, span- ning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural char- acteristics related to the graph- and hypergraph represen- tation of queries. We outline the most common shapes of queries when visually displayed as pseudographs, and char- acterize their (hyper-)tree width. Moreover, we analyze the evolution of queries over time, by introducing the novel con- cept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users, and brings us to draw a number of conclusions and pinpoint future di- rections for SPARQL query evaluation, query optimization, tuning, and benchmarking

    Answering Conjunctive Queries under Updates

    Full text link
    We consider the task of enumerating and counting answers to kk-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear∗^\ast delay and sublinear update time (and arbitrary preprocessing time) is impossible. For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the query's homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time. All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis. ∗)^\ast) By sublinear we mean O(n1−ε)O(n^{1-\varepsilon}) for some ε>0\varepsilon>0, where nn is the size of the active domain of the current database

    Integration-oriented ontology

    Get PDF
    The purpose of an integration-oriented ontology is to provide a conceptualization of a domain of interest for automating the data integration of an evolving and heterogeneous set of sources using Semantic Web technologies. It links domain concepts to each of the underlying data sources via schema mappings. Data analysts, who are domain experts but not necessarily have technical data management skills, pose ontology-mediated queries over the conceptualization, which are automatically translated to the appropriate query language for the sources at hand. Following well-established rules when designing schema mappings allows to automate the process of query rewriting and execution.Postprint (author's final draft

    Approximating Distance Measures for the Skyline

    Get PDF
    In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach

    On the Parameterized Complexity of Learning Monadic Second-Order Formulas

    Full text link
    Within the model-theoretic framework for supervised learning introduced by Grohe and Tur\'an (TOCS 2004), we study the parameterized complexity of learning concepts definable in monadic second-order logic (MSO). We show that the problem of learning a consistent MSO-formula is fixed-parameter tractable on structures of bounded tree-width and on graphs of bounded clique-width in the 1-dimensional case, that is, if the instances are single vertices (and not tuples of vertices). This generalizes previous results on strings and on trees. Moreover, in the agnostic PAC-learning setting, we show that the result also holds in higher dimensions. Finally, via a reduction to the MSO-model-checking problem, we show that learning a consistent MSO-formula is para-NP-hard on general structures

    Reachability and Distances under Multiple Changes

    Get PDF
    Recently it was shown that the transitive closure of a directed graph can be updated using first-order formulas after insertions and deletions of single edges in the dynamic descriptive complexity framework by Dong, Su, and Topor, and Patnaik and Immerman. In other words, Reachability is in DynFO. In this article we extend the framework to changes of multiple edges at a time, and study the Reachability and Distance queries under these changes. We show that the former problem can be maintained in DynFO(+, x) under changes affecting O({log n}/{log log n}) nodes, for graphs with n nodes. If the update formulas may use a majority quantifier then both Reachability and Distance can be maintained under changes that affect O(log^c n) nodes, for fixed c in N. Some preliminary results towards showing that distances are in DynFO are discussed

    Conjunctive Queries: Unique Characterizations and Exact Learnability

    Get PDF
    We answer the question of which conjunctive queries are uniquely characterized by polynomially many positive and negative examples, and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism lattice of finite structures. We also discuss implications for the unique characterizability and learnability of schema mappings and of description logic concepts
    • …
    corecore