5,523 research outputs found

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    Infinite Probabilistic Databases

    Full text link
    Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016), and with application scenarios that are better modeled by continuous probability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a primary focus on countably infinite spaces. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. We argue that finite point processes are an appropriate model from probability theory for dealing with general probabilistic databases. This allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.Comment: This is the full version of the paper "Infinite Probabilistic Databases" presented at ICDT 2020 (arXiv:1904.06766

    Knowledge Spaces and Learning Spaces

    Get PDF
    How to design automated procedures which (i) accurately assess the knowledge of a student, and (ii) efficiently provide advices for further study? To produce well-founded answers, Knowledge Space Theory relies on a combinatorial viewpoint on the assessment of knowledge, and thus departs from common, numerical evaluation. Its assessment procedures fundamentally differ from other current ones (such as those of S.A.T. and A.C.T.). They are adaptative (taking into account the possible correctness of previous answers from the student) and they produce an outcome which is far more informative than a crude numerical mark. This chapter recapitulates the main concepts underlying Knowledge Space Theory and its special case, Learning Space Theory. We begin by describing the combinatorial core of the theory, in the form of two basic axioms and the main ensuing results (most of which we give without proofs). In practical applications, learning spaces are huge combinatorial structures which may be difficult to manage. We outline methods providing efficient and comprehensive summaries of such large structures. We then describe the probabilistic part of the theory, especially the Markovian type processes which are instrumental in uncovering the knowledge states of individuals. In the guise of the ALEKS system, which includes a teaching component, these methods have been used by millions of students in schools and colleges, and by home schooled students. We summarize some of the results of these applications
    • …
    corecore