Search CORE

18 research outputs found

Coping with Incomplete Data: Recent Advances

Author: Console Marco
Guagliardo Paolo
Libkin Leonid
Toussaint Etienne
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2020
Field of study

International audienceHandling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on threevalued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We reexamine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Coping with Incomplete Data: Recent Advances

Author: Console M.
Guagliardo P.
Libkin L.
Toussaint E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Handling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on three-valued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We re-examine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers

Archivio della ricerca- Università di Roma La Sapienza

Fragments of Bag Relational Algebra: Expressiveness and Certain Answers

Author: Abiteboul
Abiteboul
Ahmetaj
Albert
Amendola
Arenas
Bertossi
Bienvenu
Buneman
Cattell
Chaudhuri
Cohen
Colby
Console
Console
Date
Green
Grumbach
Grumbach
Hernich
Hernich
Hernich
Imielinski
Jayram
Kolaitis
Lenzerini
Leonid Libkin
Libkin
Lutz
Marco Console
Nikolaou
Paolo Guagliardo
Papadimitriou
Ramakrishnan
Publication venue: 'Elsevier BV'
Publication date: 01/03/2022
Field of study

International audienceWhile all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Fragments of bag relational algebra: Expressiveness and certain answers

Author: Console M.
Guagliardo P.
Libkin L.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra

Archivio della ricerca- Università di Roma La Sapienza

Fragments of Bag Relational Algebra: Expressiveness and Certain Answers

Author: Console Marco
Guagliardo Paolo
Libkin Leonid
Publication venue
Publication date: 01/01/2019
Field of study

Edinburgh Research Explorer

Dagstuhl Research Online Publication Server

Queries with Arithmetic on Incomplete Databases

Author: Console M.
Hofer M.
Libkin L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

Archivio della ricerca- Università di Roma La Sapienza

Queries with Arithmetic on Incomplete Databases

Author: Console Marco
Hofer Matthias
Libkin Leonid
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2020
Field of study

International audienceThe standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases,relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies,we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behaviorof volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Computing Possible and Certain Answers over Order-Incomplete Data

Author: Amarilli Antoine
Ba Mouhamadou Lamine
Deutch Daniel
Senellart Pierre
Publication venue
Publication date: 01/01/2019
Field of study

This paper studies the complexity of query evaluation for databases whose relations are partially ordered; the problem commonly arises when combining or transforming ordered data from multiple sources. We focus on queries in a useful fragment of SQL, namely positive relational algebra with aggregates, whose bag semantics we extend to the partially ordered setting. Our semantics leads to the study of two main computational problems: the possibility and certainty of query answers. We show that these problems are respectively NP-complete and coNP-complete, but identify tractable cases depending on the query operators or input partial orders. We further introduce a duplicate elimination operator and study its effect on the complexity results.Comment: 55 pages, 56 references. Extended journal version of arXiv:1707.07222. Up to the stylesheet, page/environment numbering, and possible minor publisher-induced changes, this is the exact content of the journal paper that will appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Possible and Certain Answers for Queries over Order-Incomplete Data

Author: Amarilli Antoine
Ba Mouhamadou Lamine
Deutch Daniel
Senellart Pierre
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Symposium on Temporal Representation and Reasoning (TIME 2017)
Publication date: 01/01/2017
Field of study

To combine and query ordered data from multiple sources, one needs to handle uncertainty about the possible orderings. Examples of such "order-incomplete" data include integrated event sequences such as log entries; lists of properties (e.g., hotels and restaurants) ranked by an unknown function reflecting relevance or customer ratings; and documents edited concurrently with an uncertain order on edits. This paper introduces a query language for order-incomplete data, based on the positive relational algebra with order-aware accumulation. We use partial orders to represent order-incomplete data, and study possible and certain answers for queries in this context. We show that these problems are respectively NP-complete and coNP-complete, but identify many tractable cases depending on the query operators or input partial orders

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Hal-Diderot