8 research outputs found
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
We study the question of when we can provide logarithmic-time direct access
to the k-th answer to a Conjunctive Query (CQ) with a specified ordering over
the answers, following a preprocessing step that constructs a data structure in
time quasilinear in the size of the database. Specifically, we embark on the
challenge of identifying the tractable answer orderings that allow for ranked
direct access with such complexity guarantees. We begin with lexicographic
orderings and give a decidable characterization (under conventional complexity
assumptions) of the class of tractable lexicographic orderings for every CQ
without self-joins. We then continue to the more general orderings by the sum
of attribute weights and show for it that ranked direct access is tractable
only in trivial cases. Hence, to better understand the computational challenge
at hand, we consider the more modest task of providing access to only a single
answer (i.e., finding the answer at a given position) - a task that we refer to
as the selection problem. We indeed achieve a quasilinear-time algorithm for a
subset of the class of full CQs without self-joins, by adopting a solution of
Frederickson and Johnson to the classic problem of selection over sorted
matrices. We further prove that none of the other queries in this class admit
such an algorithm.Comment: 17 page
Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries
We study ranked enumeration of join-query results according to very general
orders defined by selective dioids. Our main contribution is a framework for
ranked enumeration over a class of dynamic programming problems that
generalizes seemingly different problems that had been studied in isolation. To
this end, we extend classic algorithms that find the k-shortest paths in a
weighted graph. For full conjunctive queries, including cyclic ones, our
approach is optimal in terms of the time to return the top result and the delay
between results. These optimality properties are derived for the widely used
notion of data complexity, which treats query size as a constant. By performing
a careful cost analysis, we are able to uncover a previously unknown tradeoff
between two incomparable enumeration approaches: one has lower complexity when
the number of returned results is small, the other when the number is very
large. We theoretically and empirically demonstrate the superiority of our
techniques over batch algorithms, which produce the full result and then sort
it. Our technique is not only faster for returning the first few results, but
on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure
Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries
We study ranked enumeration for Conjunctive Queries (CQs) where the answers
are ordered by a given ranking function (e.g., an ORDER BY clause in SQL). We
develop "any-k" algorithms which, without knowing the number k of desired
answers, push the ranking into joins and avoid materializing the join output
earlier than necessary. For this to be possible, the ranking function needs to
obey a certain kind of monotonicity; the supported ranking functions include
the common sum-of-weights case where query answers are compared by sums of
input weights, as well as any commutative selective dioid. One core insight of
our work is that the problem is closely related to the fundamental task of path
enumeration in a weighted DAG. We generalize and improve upon classic research
on finding the k'th shortest path and unify into the same framework several
solutions from different areas that had been studied in isolation. For the time
to the k'th ranked CQ answer (for every value of k), our approach is optimal in
data complexity precisely for the same class of queries where unranked
enumeration is optimal -- and only slower by a logarithmic factor. In a more
careful analysis of combined complexity, we uncover a previously unknown
tradeoff between two different any-k algorithms: one has lower complexity when
the number of returned results is small, the other when the number is very
large. This tradeoff is eliminated under a stricter monotonicity property that
we exploit to design a novel algorithm that asymptotically dominates all
previously known alternatives, including the well-known algorithm of Eppstein
for sum-of-weights path enumeration. We empirically demonstrate the findings of
our theoretical analysis in an experimental study that highlights the
superiority of our approach over the join-then-rank approach that existing
database systems typically follow
Efficient Computation of Quantiles over Joins
International audienceWe present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. A %JQ asks for the answer at a specified relative position (e.g., 50% for the median) under some ordering over the answers to a Join Query (JQ). Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. A recent dichotomy result rules out the existence of such an algorithm for a general family of queries and orders. Specifically, for acyclic JQs without self-joins, the problem becomes intractable for ordering by sum whenever we join more than two relations (and these joins are not trivial intersections). Moreover, even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ. In this work, we develop a new approach to solving %JQ and show how this approach allows not just to recover known results, but also generalize them and resolve open cases. Our solution uses two subroutines: The first one needs to select what we call a "pivot answer". The second subroutine partitions the space of query answers according to this pivot, and continues searching in one partition that is represented as new %JQ over a new database. For pivot selection, we develop an algorithm that works for a large class of ranking functions that are appropriately monotone. The second subroutine requires a customized construction for the specific ranking function at hand. We show the benefit and generality of our approach by using it to establish several new complexity results. First, we prove the tractability of min and max for all acyclic JQs, thereby resolving the above question. Second, we extend the previous %JQ dichotomy for sum to all partial sums (over all subsets of the attributes). Third, we handle the intractable cases of sum by devising a deterministic approximation scheme that applies to every acyclic JQ
Fair Procedures for Fair Stable Marriage Outcomes
Given a two-sided market where each agent ranks those on the other side by preference, the stable marriage problem calls for finding a perfect matching such that no pair of agents prefer each other to their matches. Recent studies show that the number of stable solutions can be large in practice. Yet the classical solution to the problem, the Gale-Shapley (GS) algorithm, assigns an optimal match to each agent on one side, and a pessimal one to each on the other side; such a solution may fare well in terms of equity only in highly asymmetric markets. Finding a stable matching that minimizes the sex equality cost, an equity measure expressing the discrepancy of mean happiness among the two sides, is strongly NP-hard. Extant heuristics either (a) oblige some agents to involuntarily abandon their matches, or (b) bias the outcome in favor of some agents, or (c) need high-polynomial or unbounded time.We provide the first procedurally fair algorithms that output equitable stable marriages and are guaranteed to terminate in at most cubic time; the key to this breakthrough is the monitoring of a monotonic state function and the use of a selective criterion for accepting proposals. Our experiments with diverse simulated markets show that: (a) extant heuristics fail to yield high equity; (b) the best solution found by the GS algorithm can be very far from optimal equity; and (c) our procedures stand out in both efficiency and equity, even when compared to a non-procedurally fair approximation scheme