49 research outputs found
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
Improving package recommendations through query relaxation
Recommendation systems aim to identify items that are likely to be of
interest to users. In many cases, users are interested in package
recommendations as collections of items. For example, a dietitian may wish to
derive a dietary plan as a collection of recipes that is nutritionally
balanced, and a travel agent may want to produce a vacation package as a
coordinated collection of travel and hotel reservations. Recent work has
explored extending recommendation systems to support packages of items. These
systems need to solve complex combinatorial problems, enforcing various
properties and constraints defined on sets of items. Introducing constraints on
packages makes recommendation queries harder to evaluate, but also harder to
express: Queries that are under-specified produce too many answers, whereas
queries that are over-specified frequently miss interesting solutions.
In this paper, we study query relaxation techniques that target package
recommendation systems. Our work offers three key insights: First, even when
the original query result is not empty, relaxing constraints can produce
preferable solutions. Second, a solution due to relaxation can only be
preferred if it improves some property specified by the query. Third,
relaxation should not treat all constraints as equals: some constraints are
more important to the users than others. Our contributions are threefold: (a)
we define the problem of deriving package recommendations through query
relaxation, (b) we design and experimentally evaluate heuristics that relax
query constraints to derive interesting packages, and (c) we present a crowd
study that evaluates the sensitivity of real users to different kinds of
constraints and demonstrates that query relaxation is a powerful tool in
diversifying package recommendations