579,015 research outputs found
Provenance for Aggregate Queries
We study in this paper provenance information for queries with aggregation.
Provenance information was studied in the context of various query languages
that do not allow for aggregation, and recent work has suggested to capture
provenance by annotating the different database tuples with elements of a
commutative semiring and propagating the annotations through query evaluation.
We show that aggregate queries pose novel challenges rendering this approach
inapplicable. Consequently, we propose a new approach, where we annotate with
provenance information not just tuples but also the individual values within
tuples, using provenance to describe the values computation. We realize this
approach in a concrete construction, first for "simple" queries where the
aggregation operator is the last one applied, and then for arbitrary (positive)
relational algebra queries with aggregation; the latter queries are shown to be
more challenging in this context. Finally, we use aggregation to encode queries
with difference, and study the semantics obtained for such queries on
provenance annotated databases
Learning Boolean Halfspaces with Small Weights from Membership Queries
We consider the problem of proper learning a Boolean Halfspace with integer
weights from membership queries only. The best known
algorithm for this problem is an adaptive algorithm that asks
membership queries where the best lower bound for the number of membership
queries is [Learning Threshold Functions with Small Weights Using
Membership Queries. COLT 1999]
In this paper we close this gap and give an adaptive proper learning
algorithm with two rounds that asks membership queries. We also give
a non-adaptive proper learning algorithm that asks membership
queries
Differential Privacy and the Fat-Shattering Dimension of Linear Queries
In this paper, we consider the task of answering linear queries under the
constraint of differential privacy. This is a general and well-studied class of
queries that captures other commonly studied classes, including predicate
queries and histogram queries. We show that the accuracy to which a set of
linear queries can be answered is closely related to its fat-shattering
dimension, a property that characterizes the learnability of real-valued
functions in the agnostic-learning setting.Comment: Appears in APPROX 201
Making Queries Tractable on Big Data with Preprocessing
A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME al-gorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to pro-vide a formal foundation for this approach in terms of com-putational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠT0Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natu-ral query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be ef-fectively converted to Π-tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠT0Q ⊂ P unless P = NC, i.e., the set ΠT0Q of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper re-factorizations. This work is a step towards understanding the tractability of queries in the context of big data. 1
Partial match queries in relaxed K-dt trees
The study of partial match queries on random hierarchical multidimensional data structures dates back to Ph. Flajolet and C. Puech’s 1986 seminal paper on partial match retrieval. It was not until recently that fixed (as opposed to random) partial match queries were studied for random relaxed K-d trees, random standard K-d trees, and random 2-dimensional quad trees. Based on those results it seemed
natural to classify the general form of the cost of fixed partial match queries into two families: that of either random hierarchical structures or perfectly balanced structures, as conjectured by Duch, Lau and Martínez (On the Cost of Fixed Partial Queries in K-d trees Algorithmica, 75(4):684–723, 2016). Here we show that the conjecture just mentioned does not hold by introducing relaxed K-dt trees and providing the average-case analysis for random partial match queries as well as some advances on the average-case analysis for fixed partial match queries on them. In fact this cost –for fixed partial match queries– does not follow the conjectured forms.Peer ReviewedPostprint (author's final draft
- …
