14,034 research outputs found
On the Limitations of Provenance for Queries With Difference
The annotation of the results of database transformations was shown to be
very effective for various applications. Until recently, most works in this
context focused on positive query languages. The provenance semirings is a
particular approach that was proven effective for these languages, and it was
shown that when propagating provenance with semirings, the expected equivalence
axioms of the corresponding query languages are satisfied. There have been
several attempts to extend the framework to account for relational algebra
queries with difference. We show here that these suggestions fail to satisfy
some expected equivalence axioms (that in particular hold for queries on
"standard" set and bag databases). Interestingly, we show that this is not a
pitfall of these particular attempts, but rather every such attempt is bound to
fail in satisfying these axioms, for some semirings. Finally, we show
particular semirings for which an extension for supporting difference is
(im)possible.Comment: TAPP 201
Relational Algebra for In-Database Process Mining
The execution logs that are used for process mining in practice are often
obtained by querying an operational database and storing the result in a flat
file. Consequently, the data processing power of the database system cannot be
used anymore for this information, leading to constrained flexibility in the
definition of mining patterns and limited execution performance in mining large
logs. Enabling process mining directly on a database - instead of via
intermediate storage in a flat file - therefore provides additional flexibility
and efficiency. To help facilitate this ideal of in-database process mining,
this paper formally defines a database operator that extracts the 'directly
follows' relation from an operational database. This operator can both be used
to do in-database process mining and to flexibly evaluate process mining
related queries, such as: "which employee most frequently changes the 'amount'
attribute of a case from one task to the next". We define the operator using
the well-known relational algebra that forms the formal underpinning of
relational databases. We formally prove equivalence properties of the operator
that are useful for query optimization and present time-complexity properties
of the operator. By doing so this paper formally defines the necessary
relational algebraic elements of a 'directly follows' operator, which are
required for implementation of such an operator in a DBMS
- …