1,581 research outputs found

    Duplicate Detection in Probabilistic Data

    Get PDF
    Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

    An Answer Explanation Model for Probabilistic Database Queries

    Get PDF
    Following the availability of huge amounts of uncertain data, coming from diverse ranges of applications such as sensors, machine learning or mining approaches, information extraction and integration, etc. in recent years, we have seen a revival of interests in probabilistic databases. Queries over these databases result in probabilistic answers. As the process of arriving at these answers is based on the underlying stored uncertain data, we argue that from the standpoint of an end user, it is helpful for such a system to give an explanation on how it arrives at an answer and on which uncertainty assumptions the derived answer is based. In this way, the user with his/her own knowledge can decide how much confidence to place in this probabilistic answer. \ud The aim of this paper is to design such an answer explanation model for probabilistic database queries. We report our design principles and show the methods to compute the answer explanations. One of the main contributions of our model is that it fills the gap between giving only the answer probability, and giving the full derivation. Furthermore, we show how to balance verifiability and influence of explanation components through the concept of verifiable views. The behavior of the model and its computational efficiency are demonstrated through an extensive performance study
    corecore