Search CORE

626 research outputs found

Recommended from our members

Evaluating aggregate functions on possibilistic data

Author: Bic Lubomir
Rundensteiner Elke A
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

The need for extending information management systems to handle the imprecision of information found in the real world has been recognized. Fuzzy set theory together with possibility theory represent a uniform framework for extending the relational database model with these features. However, none of the existing proposals for handling imprecision in the literature has dealt with queries involving a functional evaluation of a set of items, traditionally referred to as aggregation. Two kinds of aggregate operators, namely, scalar aggregates and aggregate functions, exist. Both are important for most real-world applications, and are thus being supported by traditional languages like SQL or QUEL. This paper presents a framework for handling these two types of aggregates in the context of imprecise information. We consider three cases, specifically, aggregates within vague queries on precise data, aggregates within precisely specified queries on possibilistic data, and aggregates within vague queries on imprecise data. These extensions are based on fuzzy set-theoretical concepts such as the extension principle, the sigma-count operation, and the possibilistic expected value. The consistency and completeness of the proposed operations is shown

eScholarship - University of California

Infinite Probabilistic Databases

Author: Grohe Martin
Lindner Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Computing Possible and Certain Answers over Order-Incomplete Data

Author: Amarilli Antoine
Ba Mouhamadou Lamine
Deutch Daniel
Senellart Pierre
Publication venue
Publication date: 01/01/2019
Field of study

This paper studies the complexity of query evaluation for databases whose relations are partially ordered; the problem commonly arises when combining or transforming ordered data from multiple sources. We focus on queries in a useful fragment of SQL, namely positive relational algebra with aggregates, whose bag semantics we extend to the partially ordered setting. Our semantics leads to the study of two main computational problems: the possibility and certainty of query answers. We show that these problems are respectively NP-complete and coNP-complete, but identify tractable cases depending on the query operators or input partial orders. We further introduce a duplicate elimination operator and study its effect on the complexity results.Comment: 55 pages, 56 references. Extended journal version of arXiv:1707.07222. Up to the stylesheet, page/environment numbering, and possible minor publisher-induced changes, this is the exact content of the journal paper that will appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Proceedings of the first international VLDB workshop on Management of Uncertain Data

Author: Dekhtyar A.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 24/09/2007
Field of study

University of Twente Research Information

Laws for rewriting queries containing division operators

Author: Mangold Christoph
Rantzau Ralf
Publication venue
Publication date: 08/07/2013
Field of study

Relational division, also known as small divide, is a derived operator of the relational algebra that realizes a many-to-one set containment test, where a set is represented as a group of tuples: Small divide discovers which sets in a dividend relation contain all elements of the set stored in a divisor relation. The great divide operator extends small divide by realizing many-to-many set containment tests. It is also similar to the set containment join operator for schemas that are not in first normal form. Neither small nor great divide has been implemented in commercial relational database systems although the operators solve important problems and many efficient algorithms for them exist. We present algebraic laws that allow rewriting expressions containing small or great divide, illustrate their importance for query optimization, and discuss the use of great divide for frequent itemset discovery, an important data mining primitive. A recent theoretic result shows that small divide must be implemented by special purpose algorithms and not be simulated by pure relational algebra expressions to achieve efficiency. Consequently, an efficient implementation requires that the optimizer treats small divide as a first-class operator and possesses powerful algebraic laws for query rewriting