39 research outputs found
Just in Time: Personal Temporal Insights for Altering Model Decisions
The interpretability of complex Machine Learning models is coming to be a
critical social concern, as they are increasingly used in human-related
decision-making processes such as resume filtering or loan applications.
Individuals receiving an undesired classification are likely to call for an
explanation -- preferably one that specifies what they should do in order to
alter that decision when they reapply in the future. Existing work focuses on a
single ML model and a single point in time, whereas in practice, both models
and data evolve over time: an explanation for an application rejection in 2018
may be irrelevant in 2019 since in the meantime both the model and the
applicant's data can change. To this end, we propose a novel framework that
provides users with insights and plans for changing their classification in
particular future time points. The solution is based on combining
state-of-the-art algorithms for (single) model explanations, ones for
predicting future models, and database-style querying of the obtained
explanations. We propose to demonstrate the usefulness of our solution in the
context of loan applications, and interactively engage the audience in
computing and viewing suggestions tailored for applicants based on their unique
characteristic
Explaining Queries over Web Tables to Non-Experts
Designing a reliable natural language (NL) interface for querying tables has
been a longtime goal of researchers in both the data management and natural
language processing (NLP) communities. Such an interface receives as input an
NL question, translates it into a formal query, executes the query and returns
the results. Errors in the translation process are not uncommon, and users
typically struggle to understand whether their query has been mapped correctly.
We address this problem by explaining the obtained formal queries to non-expert
users. Two methods for query explanations are presented: the first translates
queries into NL, while the second method provides a graphic representation of
the query cell-based provenance (in its execution on a given table). Our
solution augments a state-of-the-art NL interface over web tables, enhancing it
in both its training and deployment phase. Experiments, including a user study
conducted on Amazon Mechanical Turk, show our solution to improve both the
correctness and reliability of an NL interface.Comment: Short paper version to appear in ICDE 201
Putting Lipstick on Pig: Enabling Database-Style Workflow Provenance
Workflow provenance typically assumes that each module is a âblack-boxâ, so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (finegrained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow provenance. It also enables a number of novel graph transformation operations, allowing to choose the desired level of granularity in provenance querying (ZoomIn and ZoomOut), and supporting âwhat-ifâ workflow analytic queries. We implemented our approach in the Lipstick system and developed a benchmark in support of a systematic performance evaluation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance
Optimal Probabilistic Generation of XML Documents
International audienceWe study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the likelihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider a continuation-test generator that performs, while generating documents, tests of schema satisfiability; these tests prevent from generating a document violating the constraints but, as we will see, they are computationally expensive. We also study a restart generator that may generate an invalid document and, when this is the case, restarts and tries again. Finally, we consider the injection of data values into the structure, to obtain a full XML document. We study different approaches for generating these values
TOP-K Projection Queries for Probabilistic Business Processes
A Business Process (BP) consists of some business activities undertaken by one or more organizations in pursuit of some business goal. Tools for querying and analyzing BP specifications are extremely valuable for companies. In particular, given a BP specification, identifying the top-k flows that are most likely to occur in practice, out of those satisfying the criteria of a given query, is crucial for various applications such as personalized advertisements and BP web-site design. This paper studies, for the first time, top-k query evaluation for queries with projection in this context. We analyze the complexity of the problem for different classes of distribution functions for the flows likelihood, and provide efficient (PTIME) algorithms whenever possible. Furthermore, we show an interesting application of our algorithms to the analysis of BP execution traces (logs), for recovering missing information about the run-time process behavior, that has not been recorded in the logs.