2,820 research outputs found

    Indeterministic Handling of Uncertain Decisions in Duplicate Detection

    Get PDF
    In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

    Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases

    Get PDF
    The result of a temporal-probabilistic (TP) join with negation includes, at each time point, the probability with which a tuple of a positive relation p{\bf p} matches none of the tuples in a negative relation n{\bf n}, for a given join condition θ\theta. TP outer and anti joins thus resemble the characteristics of relational outer and anti joins also in the case when there exist time points at which input tuples from p{\bf p} have non-zero probabilities to be truetrue and input tuples from n{\bf n} have non-zero probabilities to be falsefalse, respectively. For the computation of TP joins with negation, we introduce generalized lineage-aware temporal windows, a mechanism that binds an output interval to the lineages of all the matching valid tuples of each input relation. We group the windows of two TP relations into three disjoint sets based on the way attributes, lineage expressions and intervals are produced. We compute all windows in an incremental manner, and we show that pipelined computations allow for the direct integration of our approach into PostgreSQL. We thereby alleviate the prevalent redundancies in the interval computations of existing approaches, which is proven by an extensive experimental evaluation with real-world datasets

    A Probabilistic Data Model and Its Semantics

    Full text link
    As database systems are increasingly being used in advanced applications, it is becoming common that data in these applications contain some elements of uncertainty. These arise from many factors, such as measurement errors and cognitive errors. As such, many researchers have focused on defining comprehensive uncertainty data models of uncertainty database systems. However, existing uncertainty data models do not adequately support some applications. Moreover, very few works address uncertainty tuple calculus. In this paper we advocate a probabilistic data model for representing uncertain information. In particular, we establish a probabilistic tuple calculus language and its semantics to meet the corresponding probabilistic relational algebra
    corecore