33 research outputs found
A Backtracking-Based Algorithm for Computing Hypertree-Decompositions
Hypertree decompositions of hypergraphs are a generalization of tree
decompositions of graphs. The corresponding hypertree-width is a measure for
the cyclicity and therefore tractability of the encoded computation problem.
Many NP-hard decision and computation problems are known to be tractable on
instances whose structure corresponds to hypergraphs of bounded
hypertree-width. Intuitively, the smaller the hypertree-width, the faster the
computation problem can be solved. In this paper, we present the new
backtracking-based algorithm det-k-decomp for computing hypertree
decompositions of small width. Our benchmark evaluations have shown that
det-k-decomp significantly outperforms opt-k-decomp, the only exact hypertree
decomposition algorithm so far. Even compared to the best heuristic algorithm,
we obtained competitive results as long as the hypergraphs are not too large.Comment: 19 pages, 6 figures, 3 table
Compressed Representations of Conjunctive Query Results
Relational queries, and in particular join queries, often generate large
output results when executed over a huge dataset. In such cases, it is often
infeasible to store the whole materialized output if we plan to reuse it
further down a data processing pipeline. Motivated by this problem, we study
the construction of space-efficient compressed representations of the output of
conjunctive queries, with the goal of supporting the efficient access of the
intermediate compressed result for a given access pattern. In particular, we
initiate the study of an important tradeoff: minimizing the space necessary to
store the compressed result, versus minimizing the answer time and delay for an
access request over the result. Our main contribution is a novel parameterized
data structure, which can be tuned to trade off space for answer time. The
tradeoff allows us to control the space requirement of the data structure
precisely, and depends both on the structure of the query and the access
pattern. We show how we can use the data structure in conjunction with query
decomposition techniques, in order to efficiently represent the outputs for
several classes of conjunctive queries.Comment: To appear in PODS'18; 35 pages; comments welcom
Structural Decompositions for Problems with Global Constraints
A wide range of problems can be modelled as constraint satisfaction problems
(CSPs), that is, a set of constraints that must be satisfied simultaneously.
Constraints can either be represented extensionally, by explicitly listing
allowed combinations of values, or implicitly, by special-purpose algorithms
provided by a solver.
Such implicitly represented constraints, known as global constraints, are
widely used; indeed, they are one of the key reasons for the success of
constraint programming in solving real-world problems. In recent years, a
variety of restrictions on the structure of CSP instances have been shown to
yield tractable classes of CSPs. However, most such restrictions fail to
guarantee tractability for CSPs with global constraints. We therefore study the
applicability of structural restrictions to instances with such constraints.
We show that when the number of solutions to a CSP instance is bounded in key
parts of the problem, structural restrictions can be used to derive new
tractable classes. Furthermore, we show that this result extends to
combinations of instances drawn from known tractable classes, as well as to CSP
instances where constraints assign costs to satisfying assignments.Comment: The final publication is available at Springer via
http://dx.doi.org/10.1007/s10601-015-9181-
Ranked Enumeration of Conjunctive Query Results
We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during execution. To allow for efficient enumeration, we exploit certain properties of ranking functions that frequently occur in practice. To this end, we introduce the notions of decomposable and compatible (w.r.t. a query decomposition) ranking functions, which allow for partial aggregation of tuple scores in order to efficiently enumerate the output. We complement the algorithmic results with lower bounds that justify why restrictions on the structure of ranking functions are necessary. Our results extend and improve upon a long line of work that has studied ranked enumeration from both a theoretical and practical perspective
Learning Models over Relational Data using Sparse Tensors and Functional Dependencies
Integrated solutions for analytics over relational databases are of great
practical importance as they avoid the costly repeated loop data scientists
have to deal with on a daily basis: select features from data residing in
relational databases using feature extraction queries involving joins,
projections, and aggregations; export the training dataset defined by such
queries; convert this dataset into the format of an external learning tool; and
train the desired model using this tool. These integrated solutions are also a
fertile ground of theoretically fundamental and challenging problems at the
intersection of relational and statistical data models.
This article introduces a unified framework for training and evaluating a
class of statistical learning models over relational databases. This class
includes ridge linear regression, polynomial regression, factorization
machines, and principal component analysis. We show that, by synergizing key
tools from database theory such as schema information, query structure,
functional dependencies, recent advances in query evaluation algorithms, and
from linear algebra such as tensor and matrix operations, one can formulate
relational analytics problems and design efficient (query and data)
structure-aware algorithms to solve them.
This theoretical development informed the design and implementation of the
AC/DC system for structure-aware learning. We benchmark the performance of
AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting
and advertisement planning applications, AC/DC can learn polynomial regression
models and factorization machines with at least the same accuracy as its
competitors and up to three orders of magnitude faster than its competitors
whenever they do not run out of memory, exceed 24-hour timeout, or encounter
internal design limitations.Comment: 61 pages, 9 figures, 2 table
Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries
We study ranked enumeration of join-query results according to very general
orders defined by selective dioids. Our main contribution is a framework for
ranked enumeration over a class of dynamic programming problems that
generalizes seemingly different problems that had been studied in isolation. To
this end, we extend classic algorithms that find the k-shortest paths in a
weighted graph. For full conjunctive queries, including cyclic ones, our
approach is optimal in terms of the time to return the top result and the delay
between results. These optimality properties are derived for the widely used
notion of data complexity, which treats query size as a constant. By performing
a careful cost analysis, we are able to uncover a previously unknown tradeoff
between two incomparable enumeration approaches: one has lower complexity when
the number of returned results is small, the other when the number is very
large. We theoretically and empirically demonstrate the superiority of our
techniques over batch algorithms, which produce the full result and then sort
it. Our technique is not only faster for returning the first few results, but
on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure