45 research outputs found
A Rule-Based Approach to Analyzing Database Schema Objects with Datalog
Database schema elements such as tables, views, triggers and functions are
typically defined with many interrelationships. In order to support database
users in understanding a given schema, a rule-based approach for analyzing the
respective dependencies is proposed using Datalog expressions. We show that
many interesting properties of schema elements can be systematically determined
this way. The expressiveness of the proposed analysis is exemplarily shown with
the problem of computing induced functional dependencies for derived relations.
The propagation of functional dependencies plays an important role in data
integration and query optimization but represents an undecidable problem in
general. And yet, our rule-based analysis covers all relational operators as
well as linear recursive expressions in a systematic way showing the depth of
analysis possible by our proposal. The analysis of functional dependencies is
well-integrated in a uniform approach to analyzing dependencies between schema
elements in general.Comment: Pre-proceedings paper presented at the 27th International Symposium
on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur,
Belgium, 10-12 October 2017 (arXiv:1708.07854
GROM: a general rewriter of semantic mappings
We present GROM, a tool conceived to handle high-level schema mappings between semantic descriptions of a source and a target database. GROM rewrites mappings between the virtual, view-based semantic schemas, in terms of mappings between the two physical databases, and then executes them. The system serves the purpose of teaching two main lessons. First, designing mappings among higher-level descriptions is often simpler than working with the original schemas. Second, as soon as the view-definition language becomes more expressive, to handle, for example, negation, the mapping problem becomes extremely challenging from the technical viewpoint, so that one needs to find a proper trade-off between expressiveness and scalability.Peer ReviewedPostprint (published version
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Coherent Integration of Databases by Abductive Logic Programming
We introduce an abductive method for a coherent integration of independent
data-sources. The idea is to compute a list of data-facts that should be
inserted to the amalgamated database or retracted from it in order to restore
its consistency. This method is implemented by an abductive solver, called
Asystem, that applies SLDNFA-resolution on a meta-theory that relates
different, possibly contradicting, input databases. We also give a pure
model-theoretic analysis of the possible ways to `recover' consistent data from
an inconsistent database in terms of those models of the database that exhibit
as minimal inconsistent information as reasonably possible. This allows us to
characterize the `recovered databases' in terms of the `preferred' (i.e., most
consistent) models of the theory. The outcome is an abductive-based application
that is sound and complete with respect to a corresponding model-based,
preferential semantics, and -- to the best of our knowledge -- is more
expressive (thus more general) than any other implementation of coherent
integration of databases