3 research outputs found
08421 Abstracts Collection -- Uncertainty Management in Information Systems
From October 12 to 17, 2008 the Dagstuhl Seminar 08421 \u27`Uncertainty Management in Information Systems \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. The abstracts of the plenary and session talks given during the seminar as well as those of the shown demos are put together in this paper
Cleaning Denial Constraint Violations through Relaxation
Data cleaning is a time-consuming process that depends on the data analysis
that users perform. Existing solutions treat data cleaning as a separate
offline process that takes place before analysis begins. Applying data cleaning
before analysis assumes a priori knowledge of the inconsistencies and the query
workload, thereby requiring effort on understanding and cleaning the data that
is unnecessary for the analysis. We propose an approach that performs
probabilistic repair of denial constraint violations on-demand, driven by the
exploratory analysis that users perform. We introduce Daisy, a system that
seamlessly integrates data cleaning into the analysis by relaxing query
results. Daisy executes analytical query-workloads over dirty data by weaving
cleaning operators into the query plan. Our evaluation shows that Daisy adapts
to the workload and outperforms traditional offline cleaning on both synthetic
and real-world workloads.Comment: To appear in SIGMOD 2020 proceeding
A Probabilistic Data Fusion Modeling Approach for Extracting True Values from Uncertain and Conflicting Attributes
Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task is called data fusion. In this paper, we deal with the problem of data fusion based on probabilistic entity linkage and uncertainty management in conflict data. Data fusion has been widely explored in the research community. However, concerns such as explicit uncertainty management and on-demand data fusion, which can cope with dynamic data sources, have not been studied well. This paper proposes a new probabilistic data fusion modeling approach that attempts to find true data values under conditions of uncertain or conflicted multi-valued attributes. These attributes are generated from the probabilistic linkage and merging alternatives of multi-corresponding entities. Consequently, the paper identifies and formulates several data fusion cases and sample spaces that require further conditional computation using our computational fusion method. The identification is established to fit with a real-world data fusion problem. In the real world, there is always the possibility of heterogeneous data sources, the integration of probabilistic entities, single or multiple truth values for certain attributes, and different combinations of attribute values as alternatives for each generated entity. We validate our probabilistic data fusion approach through mathematical representation based on three data sources with different reliability scores. The validity of the approach was assessed via implementation into our probabilistic integration system to show how it can manage and resolve different cases of data conflicts and inconsistencies. The outcome showed improved accuracy in identifying true values due to the association of constructive evidence