Search CORE

2,820 research outputs found

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

Author: Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2010
Field of study

In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

University of Twente Research Information

Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases

Author: Böhlen Michael
Papaioannou Katerina
Theobald Martin
Publication venue
Publication date: 12/02/2019
Field of study

The result of a temporal-probabilistic (TP) join with negation includes, at each time point, the probability with which a tuple of a positive relation

{\bf p}

matches none of the tuples in a negative relation

{\bf n}

, for a given join condition

\theta

. TP outer and anti joins thus resemble the characteristics of relational outer and anti joins also in the case when there exist time points at which input tuples from

{\bf p}

have non-zero probabilities to be

true

and input tuples from

{\bf n}

have non-zero probabilities to be

false

, respectively. For the computation of TP joins with negation, we introduce generalized lineage-aware temporal windows, a mechanism that binds an output interval to the lineages of all the matching valid tuples of each input relation. We group the windows of two TP relations into three disjoint sets based on the way attributes, lineage expressions and intervals are produced. We compute all windows in an incremental manner, and we show that pipelined computations allow for the direct integration of our approach into PostgreSQL. We thereby alleviate the prevalent redundancies in the interval computations of existing approaches, which is proven by an extensive experimental evaluation with real-world datasets

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

A Probabilistic Data Model and Its Semantics

Author: Zhang C
Zhang S
Publication venue
Publication date: 17/11/2003
Field of study

As database systems are increasingly being used in advanced applications, it is becoming common that data in these applications contain some elements of uncertainty. These arise from many factors, such as measurement errors and cognitive errors. As such, many researchers have focused on defining comprehensive uncertainty data models of uncertainty database systems. However, existing uncertainty data models do not adequately support some applications. Moreover, very few works address uncertainty tuple calculus. In this paper we advocate a probabilistic data model for representing uncertain information. In particular, we establish a probabilistic tuple calculus language and its semantics to meet the corresponding probabilistic relational algebra

OPUS - University of Technology Sydney