20,054 research outputs found

    Indeterministic Handling of Uncertain Decisions in Duplicate Detection

    Get PDF
    In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

    Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    Get PDF
    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

    Application of probabilistic PCR5 Fusion Rule for Multisensor Target Tracking

    Full text link
    This paper defines and implements a non-Bayesian fusion rule for combining densities of probabilities estimated by local (non-linear) filters for tracking a moving target by passive sensors. This rule is the restriction to a strict probabilistic paradigm of the recent and efficient Proportional Conflict Redistribution rule no 5 (PCR5) developed in the DSmT framework for fusing basic belief assignments. A sampling method for probabilistic PCR5 (p-PCR5) is defined. It is shown that p-PCR5 is more robust to an erroneous modeling and allows to keep the modes of local densities and preserve as much as possible the whole information inherent to each densities to combine. In particular, p-PCR5 is able of maintaining multiple hypotheses/modes after fusion, when the hypotheses are too distant in regards to their deviations. This new p-PCR5 rule has been tested on a simple example of distributed non-linear filtering application to show the interest of such approach for future developments. The non-linear distributed filter is implemented through a basic particles filtering technique. The results obtained in our simulations show the ability of this p-PCR5-based filter to track the target even when the models are not well consistent in regards to the initialization and real cinematic

    A Probabilistic Data Fusion Modeling Approach for Extracting True Values from Uncertain and Conflicting Attributes

    Get PDF
    Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task is called data fusion. In this paper, we deal with the problem of data fusion based on probabilistic entity linkage and uncertainty management in conflict data. Data fusion has been widely explored in the research community. However, concerns such as explicit uncertainty management and on-demand data fusion, which can cope with dynamic data sources, have not been studied well. This paper proposes a new probabilistic data fusion modeling approach that attempts to find true data values under conditions of uncertain or conflicted multi-valued attributes. These attributes are generated from the probabilistic linkage and merging alternatives of multi-corresponding entities. Consequently, the paper identifies and formulates several data fusion cases and sample spaces that require further conditional computation using our computational fusion method. The identification is established to fit with a real-world data fusion problem. In the real world, there is always the possibility of heterogeneous data sources, the integration of probabilistic entities, single or multiple truth values for certain attributes, and different combinations of attribute values as alternatives for each generated entity. We validate our probabilistic data fusion approach through mathematical representation based on three data sources with different reliability scores. The validity of the approach was assessed via implementation into our probabilistic integration system to show how it can manage and resolve different cases of data conflicts and inconsistencies. The outcome showed improved accuracy in identifying true values due to the association of constructive evidence

    The Basic Principles of Uncertain Information Fusion. An organized review of merging rules in different representation frameworks

    Get PDF
    We propose and advocate basic principles for the fusion of incomplete or uncertain information items, that should apply regardless of the formalism adopted for representing pieces of information coming from several sources. This formalism can be based on sets, logic, partial orders, possibility theory, belief functions or imprecise probabilities. We propose a general notion of information item representing incomplete or uncertain information about the values of an entity of interest. It is supposed to rank such values in terms of relative plausibility, and explicitly point out impossible values. Basic issues affecting the results of the fusion process, such as relative information content and consistency of information items, as well as their mutual consistency, are discussed. For each representation setting, we present fusion rules that obey our principles, and compare them to postulates specific to the representation proposed in the past. In the crudest (Boolean) representation setting (using a set of possible values), we show that the understanding of the set in terms of most plausible values, or in terms of non-impossible ones matters for choosing a relevant fusion rule. Especially, in the latter case our principles justify the method of maximal consistent subsets, while the former is related to the fusion of logical bases. Then we consider several formal settings for incomplete or uncertain information items, where our postulates are instantiated: plausibility orderings, qualitative and quantitative possibility distributions, belief functions and convex sets of probabilities. The aim of this paper is to provide a unified picture of fusion rules across various uncertainty representation settings
    • 

    corecore