33 research outputs found

    Querying Incomplete Numerical Data: Between Certain and Possible Answers

    Get PDF
    International audienc

    Rewriting with Acyclic Queries: Mind Your Head

    Get PDF
    The paper studies the rewriting problem, that is, the decision problem whether, for a given conjunctive query Q and a set ? of views, there is a conjunctive query Q\u27 over ? that is equivalent to Q, for cases where the query, the views, and/or the desired rewriting are acyclic or even more restricted. It shows that, if Q itself is acyclic, an acyclic rewriting exists if there is any rewriting. An analogous statement also holds for free-connex acyclic, hierarchical, and q-hierarchical queries. Regarding the complexity of the rewriting problem, the paper identifies a border between tractable and (presumably) intractable variants of the rewriting problem: for schemas of bounded arity, the acyclic rewriting problem is NP-hard, even if both Q and the views in ? are acyclic or hierarchical. However, it becomes tractable, if the views are free-connex acyclic (i.e., in a nutshell, their body is (i) acyclic and (ii) remains acyclic if their head is added as an additional atom)

    Decision Problems in Information Theory

    Get PDF
    Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy. We establish the following results on checking the validity over all almost-entropic functions: first, validity of a Boolean information constraint arising from a monotone Boolean formula is co-recursively enumerable; second, validity of "tight" conditional information constraints is in ???. Furthermore, under some restrictions, validity of conditional information constraints "with slack" is in ???, and validity of information inequality constraints involving max is Turing equivalent to validity of information inequality constraints (with no max involved). We also prove that the classical implication problem for conditional independence statements is co-recursively enumerable

    Even the Easiest(?) Graph Coloring Problem Is Not Easy in Streaming!

    Get PDF
    We study a graph coloring problem that is otherwise easy in the RAM model but becomes quite non-trivial in the one-pass streaming model. In contrast to previous graph coloring problems in streaming that try to find an assignment of colors to vertices, our main work is on estimating the number of conflicting or monochromatic edges given a coloring function that is streaming along with the graph; we call the problem Conflict-Est. The coloring function on a vertex can be read or accessed only when the vertex is revealed in the stream. If we need the color on a vertex that has streamed past, then that color, along with its vertex, has to be stored explicitly. We provide algorithms for a graph that is streaming in different variants of the vertex arrival in one-pass streaming model, viz. the Vertex Arrival (VA), Vertex Arrival With Degree Oracle (VAdeg), Vertex Arrival in Random Order (VArand) models, with special focus on the random order model. We also provide matching lower bounds for most of the cases. The mainstay of our work is in showing that the properties of a random order stream can be exploited to design efficient streaming algorithms for estimating the number of monochromatic edges. We have also obtained a lower bound, though not matching the upper bound, for the random order model. Among all the three models vis-a-vis this problem, we can show a clear separation of power in favor of the VArand model

    Queries with Arithmetic on Incomplete Databases

    Get PDF
    The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

    Queries with Arithmetic on Incomplete Databases

    Get PDF
    International audienceThe standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases,relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies,we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behaviorof volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

    Coping with Incomplete Data: Recent Advances

    Get PDF
    International audienceHandling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on threevalued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We reexamine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers

    Small Circuits Imply Efficient Arthur-Merlin Protocols

    Get PDF
    The inner product function ? x,y ? = ?_i x_i y_i mod 2 can be easily computed by a (linear-size) AC?(?) circuit: that is, a constant depth circuit with AND, OR and parity (XOR) gates. But what if we impose the restriction that the parity gates can only be on the bottom most layer (closest to the input)? Namely, can the inner product function be computed by an AC? circuit composed with a single layer of parity gates? This seemingly simple question is an important open question at the frontier of circuit lower bound research. In this work, we focus on a minimalistic version of the above question. Namely, whether the inner product function cannot be approximated by a small DNF augmented with a single layer of parity gates. Our main result shows that the existence of such a circuit would have unexpected implications for interactive proofs, or more specifically, for interactive variants of the Data Streaming and Communication Complexity models. In particular, we show that the existence of such a small (i.e., polynomial-size) circuit yields: 1) An O(d)-message protocol in the Arthur-Merlin Data Streaming model for every n-variate, degree d polynomial (over GF(2)), using only O?(d) ?log(n) communication and space complexity. In particular, this gives an AM[2] Data Streaming protocol for a variant of the well-studied triangle counting problem, with poly-logarithmic communication and space complexities. 2) A 2-message communication complexity protocol for any sparse (or low degree) polynomial, and for any function computable by an AC?(?) circuit. Specifically, for the latter, we obtain a protocol with communication complexity that is poly-logarithmic in the size of the AC?(?) circuit

    Maximum Coverage in the Data Stream Model: Parameterized and Generalized

    Get PDF
    We present algorithms for the Max-Cover and Max-Unique-Cover problems in the data stream model. The input to both problems are mm subsets of a universe of size nn and a value k[m]k\in [m]. In Max-Cover, the problem is to find a collection of at most kk sets such that the number of elements covered by at least one set is maximized. In Max-Unique-Cover, the problem is to find a collection of at most kk sets such that the number of elements covered by exactly one set is maximized. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: If the sets have size at most dd, there exist single-pass algorithms using O~(dd+1kd)\tilde{O}(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant dd. If each element appears in at most rr sets, we present single pass algorithms using O~(k2r/ϵ3)\tilde{O}(k^2 r/\epsilon^3) space that return a 1+ϵ1+\epsilon approximation in the case of Max-Cover. We also present a single-pass algorithm using slightly more memory, i.e., O~(k3r/ϵ4)\tilde{O}(k^3 r/\epsilon^{4}) space, that 1+ϵ1+\epsilon approximates Max-Unique-Cover. In contrast to the above results, when dd and rr are arbitrary, any constant pass 1+ϵ1+\epsilon approximation algorithm for either problem requires Ω(ϵ2m)\Omega(\epsilon^{-2}m) space but a single pass O(ϵ2mk)O(\epsilon^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e1)e/(e-1) and e11/ke^{1-1/k} for Max-Cover and Max-Unique-Cover respectively requires Ω(m/k2)\Omega(m/k^2) space when dd and rr are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set-Cover problem.Comment: Conference version to appear at ICDT 202
    corecore