33 research outputs found

    A Backtracking-Based Algorithm for Computing Hypertree-Decompositions

    Full text link
    Hypertree decompositions of hypergraphs are a generalization of tree decompositions of graphs. The corresponding hypertree-width is a measure for the cyclicity and therefore tractability of the encoded computation problem. Many NP-hard decision and computation problems are known to be tractable on instances whose structure corresponds to hypergraphs of bounded hypertree-width. Intuitively, the smaller the hypertree-width, the faster the computation problem can be solved. In this paper, we present the new backtracking-based algorithm det-k-decomp for computing hypertree decompositions of small width. Our benchmark evaluations have shown that det-k-decomp significantly outperforms opt-k-decomp, the only exact hypertree decomposition algorithm so far. Even compared to the best heuristic algorithm, we obtained competitive results as long as the hypergraphs are not too large.Comment: 19 pages, 6 figures, 3 table

    Compressed Representations of Conjunctive Query Results

    Full text link
    Relational queries, and in particular join queries, often generate large output results when executed over a huge dataset. In such cases, it is often infeasible to store the whole materialized output if we plan to reuse it further down a data processing pipeline. Motivated by this problem, we study the construction of space-efficient compressed representations of the output of conjunctive queries, with the goal of supporting the efficient access of the intermediate compressed result for a given access pattern. In particular, we initiate the study of an important tradeoff: minimizing the space necessary to store the compressed result, versus minimizing the answer time and delay for an access request over the result. Our main contribution is a novel parameterized data structure, which can be tuned to trade off space for answer time. The tradeoff allows us to control the space requirement of the data structure precisely, and depends both on the structure of the query and the access pattern. We show how we can use the data structure in conjunction with query decomposition techniques, in order to efficiently represent the outputs for several classes of conjunctive queries.Comment: To appear in PODS'18; 35 pages; comments welcom

    Structural Decompositions for Problems with Global Constraints

    Full text link
    A wide range of problems can be modelled as constraint satisfaction problems (CSPs), that is, a set of constraints that must be satisfied simultaneously. Constraints can either be represented extensionally, by explicitly listing allowed combinations of values, or implicitly, by special-purpose algorithms provided by a solver. Such implicitly represented constraints, known as global constraints, are widely used; indeed, they are one of the key reasons for the success of constraint programming in solving real-world problems. In recent years, a variety of restrictions on the structure of CSP instances have been shown to yield tractable classes of CSPs. However, most such restrictions fail to guarantee tractability for CSPs with global constraints. We therefore study the applicability of structural restrictions to instances with such constraints. We show that when the number of solutions to a CSP instance is bounded in key parts of the problem, structural restrictions can be used to derive new tractable classes. Furthermore, we show that this result extends to combinations of instances drawn from known tractable classes, as well as to CSP instances where constraints assign costs to satisfying assignments.Comment: The final publication is available at Springer via http://dx.doi.org/10.1007/s10601-015-9181-

    Ranked Enumeration of Conjunctive Query Results

    Get PDF
    We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during execution. To allow for efficient enumeration, we exploit certain properties of ranking functions that frequently occur in practice. To this end, we introduce the notions of decomposable and compatible (w.r.t. a query decomposition) ranking functions, which allow for partial aggregation of tuple scores in order to efficiently enumerate the output. We complement the algorithmic results with lower bounds that justify why restrictions on the structure of ranking functions are necessary. Our results extend and improve upon a long line of work that has studied ranked enumeration from both a theoretical and practical perspective

    Learning Models over Relational Data using Sparse Tensors and Functional Dependencies

    Full text link
    Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool. These integrated solutions are also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models. This article introduces a unified framework for training and evaluating a class of statistical learning models over relational databases. This class includes ridge linear regression, polynomial regression, factorization machines, and principal component analysis. We show that, by synergizing key tools from database theory such as schema information, query structure, functional dependencies, recent advances in query evaluation algorithms, and from linear algebra such as tensor and matrix operations, one can formulate relational analytics problems and design efficient (query and data) structure-aware algorithms to solve them. This theoretical development informed the design and implementation of the AC/DC system for structure-aware learning. We benchmark the performance of AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting and advertisement planning applications, AC/DC can learn polynomial regression models and factorization machines with at least the same accuracy as its competitors and up to three orders of magnitude faster than its competitors whenever they do not run out of memory, exceed 24-hour timeout, or encounter internal design limitations.Comment: 61 pages, 9 figures, 2 table

    Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

    Get PDF
    We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. These optimality properties are derived for the widely used notion of data complexity, which treats query size as a constant. By performing a careful cost analysis, we are able to uncover a previously unknown tradeoff between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure
    corecore