44,006 research outputs found
Fast and Simple Relational Processing of Uncertain Data
This paper introduces U-relations, a succinct and purely relational
representation system for uncertain databases. U-relations support
attribute-level uncertainty using vertical partitioning. If we consider
positive relational algebra extended by an operation for computing possible
answers, a query on the logical level can be translated into, and evaluated as,
a single relational algebra query on the U-relation representation. The
translation scheme essentially preserves the size of the query in terms of
number of operations and, in particular, number of joins. Standard techniques
employed in off-the-shelf relational database management systems are effective
for optimizing and processing queries on U-relations. In our experiments we
show that query evaluation on U-relations scales to large amounts of data with
high degrees of uncertainty.Comment: 12 pages, 14 figure
Implementing imperfect information in fuzzy databases
Information in real-world applications is often
vague, imprecise and uncertain. Ignoring the inherent imperfect
nature of real-world will undoubtedly introduce some deformation of human perception of real-world and may eliminate several
substantial information, which may be very useful in several
data-intensive applications. In database context, several fuzzy
database models have been proposed. In these works, fuzziness
is introduced at different levels. Common to all these proposals is
the support of fuzziness at the attribute level. This paper proposes
first a rich set of data types devoted to model the different kinds
of imperfect information. The paper then proposes a formal
approach to implement these data types. The proposed approach
was implemented within a relational object database model but it
is generic enough to be incorporated into other database models.ou
Duplicate Detection in Probabilistic Data
Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data
Infinite Probabilistic Databases
Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009).
We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics.
It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries
Bipolar querying of valid-time intervals subject to uncertainty
Databases model parts of reality by containing data representing properties of real-world objects or concepts. Often, some of these properties are time-related. Thus, databases often contain data representing time-related information. However, as they may be produced by humans, such data or information may contain imperfections like uncertainties. An important purpose of databases is to allow their data to be queried, to allow access to the information these data represent. Users may do this using queries, in which they describe their preferences concerning the data they are (not) interested in. Because users may have both positive and negative such preferences, they may want to query databases in a bipolar way. Such preferences may also have a temporal nature, but, traditionally, temporal query conditions are handled specifically. In this paper, a novel technique is presented to query a valid-time relation containing uncertain valid-time data in a bipolar way, which allows the query to have a single bipolar temporal query condition
Combining quantifications for flexible query result ranking
Databases contain data and database systems governing such databases are often intended to allow a user to query these data. On one hand, these data may be subject to imperfections, on the other hand, users may employ imperfect query preference specifications to query such databases. All of these imperfections lead to each query answer being accompanied by a collection of quantifications indicating how well (part of) a group of data complies with (part of) the user's query. A fundamental question is how to present the user with the query answers complying best to his or her query preferences. The work presented in this paper first determines the difficulties to overcome in reaching such presentation. Mainly, a useful presentation needs the ranking of the query answers based on the aforementioned quantifications, but it seems advisable to not combine quantifications with different interpretations. Thus, the work presented in this paper continues to introduce and examine a novel technique to determine a query answer ranking. Finally, a few aspects of this technique, among which its computational efficiency, are discussed
- …