18 research outputs found
Verification of Query Completeness over Processes [Extended Version]
Data completeness is an essential aspect of data quality, and has in turn a
huge impact on the effective management of companies. For example, statistics
are computed and audits are conducted in companies by implicitly placing the
strong assumption that the analysed data are complete. In this work, we are
interested in studying the problem of completeness of data produced by business
processes, to the aim of automatically assessing whether a given database query
can be answered with complete information in a certain state of the process. We
formalize so-called quality-aware processes that create data in the real world
and store it in the company's information system possibly at a later point.Comment: Extended version of a paper that was submitted to BPM 201
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Coherent Integration of Databases by Abductive Logic Programming
We introduce an abductive method for a coherent integration of independent
data-sources. The idea is to compute a list of data-facts that should be
inserted to the amalgamated database or retracted from it in order to restore
its consistency. This method is implemented by an abductive solver, called
Asystem, that applies SLDNFA-resolution on a meta-theory that relates
different, possibly contradicting, input databases. We also give a pure
model-theoretic analysis of the possible ways to `recover' consistent data from
an inconsistent database in terms of those models of the database that exhibit
as minimal inconsistent information as reasonably possible. This allows us to
characterize the `recovered databases' in terms of the `preferred' (i.e., most
consistent) models of the theory. The outcome is an abductive-based application
that is sound and complete with respect to a corresponding model-based,
preferential semantics, and -- to the best of our knowledge -- is more
expressive (thus more general) than any other implementation of coherent
integration of databases
Magic Sets for Disjunctive Datalog Programs
In this paper, a new technique for the optimization of (partially) bound
queries over disjunctive Datalog programs with stratified negation is
presented. The technique exploits the propagation of query bindings and extends
the Magic Set (MS) optimization technique.
An important feature of disjunctive Datalog is nonmonotonicity, which calls
for nondeterministic implementations, such as backtracking search. A
distinguishing characteristic of the new method is that the optimization can be
exploited also during the nondeterministic phase. In particular, after some
assumptions have been made during the computation, parts of the program may
become irrelevant to a query under these assumptions. This allows for dynamic
pruning of the search space. In contrast, the effect of the previously defined
MS methods for disjunctive Datalog is limited to the deterministic portion of
the process. In this way, the potential performance gain by using the proposed
method can be exponential, as could be observed empirically.
The correctness of MS is established thanks to a strong relationship between
MS and unfounded sets that has not been studied in the literature before. This
knowledge allows for extending the method also to programs with stratified
negation in a natural way.
The proposed method has been implemented in DLV and various experiments have
been conducted. Experimental results on synthetic data confirm the utility of
MS for disjunctive Datalog, and they highlight the computational gain that may
be obtained by the new method w.r.t. the previously proposed MS methods for
disjunctive Datalog programs. Further experiments on real-world data show the
benefits of MS within an application scenario that has received considerable
attention in recent years, the problem of answering user queries over possibly
inconsistent databases originating from integration of autonomous sources of
information.Comment: 67 pages, 19 figures, preprint submitted to Artificial Intelligenc