Search CORE

33 research outputs found

Querying Incomplete Numerical Data: Between Certain and Possible Answers

Author: Console Marco
Libkin Leonid
Peterfreund Liat
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2023
Field of study

International audienc

Edinburgh Research Explorer

HAL-Ecole des Ponts ParisTech

Rewriting with Acyclic Queries: Mind Your Head

Author: Geck Gaetano
Keppeler Jens
Schwentick Thomas
Spinrath Christopher
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th International Conference on Database Theory (ICDT 2022)
Publication date: 01/01/2022
Field of study

The paper studies the rewriting problem, that is, the decision problem whether, for a given conjunctive query Q and a set ? of views, there is a conjunctive query Q\u27 over ? that is equivalent to Q, for cases where the query, the views, and/or the desired rewriting are acyclic or even more restricted. It shows that, if Q itself is acyclic, an acyclic rewriting exists if there is any rewriting. An analogous statement also holds for free-connex acyclic, hierarchical, and q-hierarchical queries. Regarding the complexity of the rewriting problem, the paper identifies a border between tractable and (presumably) intractable variants of the rewriting problem: for schemas of bounded arity, the acyclic rewriting problem is NP-hard, even if both Q and the views in ? are acyclic or hierarchical. However, it becomes tractable, if the views are free-connex acyclic (i.e., in a nutshell, their body is (i) acyclic and (ii) remains acyclic if their head is added as an additional atom)

arXiv.org e-Print Archive

Episciences.org

Dagstuhl Research Online Publication Server

Decision Problems in Information Theory

Author: Abo Khamis Mahmoud
Kolaitis Phokion G.
Ngo Hung Q.
Suciu Dan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)
Publication date: 01/01/2020
Field of study

Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy. We establish the following results on checking the validity over all almost-entropic functions: first, validity of a Boolean information constraint arising from a monotone Boolean formula is co-recursively enumerable; second, validity of "tight" conditional information constraints is in ???. Furthermore, under some restrictions, validity of conditional information constraints "with slack" is in ???, and validity of information inequality constraints involving max is Turing equivalent to validity of information inequality constraints (with no max involved). We also prove that the classical implication problem for conditional independence statements is co-recursively enumerable

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Even the Easiest(?) Graph Coloring Problem Is Not Easy in Streaming!

Author: Bhattacharya Anup
Bishnu Arijit
Mishra Gopinath
Upasana Anannya
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)
Publication date: 01/01/2021
Field of study

We study a graph coloring problem that is otherwise easy in the RAM model but becomes quite non-trivial in the one-pass streaming model. In contrast to previous graph coloring problems in streaming that try to find an assignment of colors to vertices, our main work is on estimating the number of conflicting or monochromatic edges given a coloring function that is streaming along with the graph; we call the problem Conflict-Est. The coloring function on a vertex can be read or accessed only when the vertex is revealed in the stream. If we need the color on a vertex that has streamed past, then that color, along with its vertex, has to be stored explicitly. We provide algorithms for a graph that is streaming in different variants of the vertex arrival in one-pass streaming model, viz. the Vertex Arrival (VA), Vertex Arrival With Degree Oracle (VAdeg), Vertex Arrival in Random Order (VArand) models, with special focus on the random order model. We also provide matching lower bounds for most of the cases. The mainstay of our work is in showing that the properties of a random order stream can be exploited to design efficient streaming algorithms for estimating the number of monochromatic edges. We have also obtained a lower bound, though not matching the upper bound, for the random order model. Among all the three models vis-a-vis this problem, we can show a clear separation of power in favor of the VArand model

Dagstuhl Research Online Publication Server

Queries with Arithmetic on Incomplete Databases

Author: Console M.
Hofer M.
Libkin L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

Archivio della ricerca- Università di Roma La Sapienza

Queries with Arithmetic on Incomplete Databases

Author: Console Marco
Hofer Matthias
Libkin Leonid
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2020
Field of study

International audienceThe standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases,relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies,we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behaviorof volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Coping with Incomplete Data: Recent Advances

Author: Console Marco
Guagliardo Paolo
Libkin Leonid
Toussaint Etienne
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2020
Field of study

International audienceHandling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on threevalued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We reexamine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Small Circuits Imply Efficient Arthur-Merlin Protocols

Author: Ezra Michael
Rothblum Ron D.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

The inner product function ? x,y ? = ?_i x_i y_i mod 2 can be easily computed by a (linear-size) AC?(?) circuit: that is, a constant depth circuit with AND, OR and parity (XOR) gates. But what if we impose the restriction that the parity gates can only be on the bottom most layer (closest to the input)? Namely, can the inner product function be computed by an AC? circuit composed with a single layer of parity gates? This seemingly simple question is an important open question at the frontier of circuit lower bound research. In this work, we focus on a minimalistic version of the above question. Namely, whether the inner product function cannot be approximated by a small DNF augmented with a single layer of parity gates. Our main result shows that the existence of such a circuit would have unexpected implications for interactive proofs, or more specifically, for interactive variants of the Data Streaming and Communication Complexity models. In particular, we show that the existence of such a small (i.e., polynomial-size) circuit yields: 1) An O(d)-message protocol in the Arthur-Merlin Data Streaming model for every n-variate, degree d polynomial (over GF(2)), using only O?(d) ?log(n) communication and space complexity. In particular, this gives an AM[2] Data Streaming protocol for a variant of the well-studied triangle counting problem, with poly-logarithmic communication and space complexities. 2) A 2-message communication complexity protocol for any sparse (or low degree) polynomial, and for any function computable by an AC?(?) circuit. Specifically, for the latter, we obtain a protocol with communication complexity that is poly-logarithmic in the size of the AC?(?) circuit

Dagstuhl Research Online Publication Server

Maximum Coverage in the Data Stream Model: Parameterized and Generalized

Author: McGregor Andrew
Tench David
Vu Hoa T.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Database Theory (ICDT 2021)
Publication date: 01/01/2021
Field of study

We present algorithms for the Max-Cover and Max-Unique-Cover problems in the data stream model. The input to both problems are

m

subsets of a universe of size

n

and a value

k\in [m]

. In Max-Cover, the problem is to find a collection of at most

k

sets such that the number of elements covered by at least one set is maximized. In Max-Unique-Cover, the problem is to find a collection of at most

k

sets such that the number of elements covered by exactly one set is maximized. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: If the sets have size at most

d

, there exist single-pass algorithms using

\tilde{O}(d^{d+1} k^d)

space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant

d

. If each element appears in at most

r

sets, we present single pass algorithms using

\tilde{O}(k^2 r/\epsilon^3)

space that return a

1+\epsilon

approximation in the case of Max-Cover. We also present a single-pass algorithm using slightly more memory, i.e.,

\tilde{O}(k^3 r/\epsilon^{4})

space, that

1+\epsilon

approximates Max-Unique-Cover. In contrast to the above results, when

d

and

r

are arbitrary, any constant pass

1+\epsilon

approximation algorithm for either problem requires

\Omega(\epsilon^{-2}m)

space but a single pass

O(\epsilon^{-2}mk)

space algorithm exists. In fact any constant-pass algorithm with an approximation better than

e/(e-1)

and

e^{1-1/k}

for Max-Cover and Max-Unique-Cover respectively requires

\Omega(m/k^2)

space when

d

and

r

are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set-Cover problem.Comment: Conference version to appear at ICDT 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server