18,021 research outputs found
Some issues in data model mapping
Numerous data models have been reported in the literature since the early 1970's. They have been used as database interfaces and as conceptual design tools. The mapping between schemas expressed according to the same data model or according to different models is interesting for theoretical and practical purposes. This paper addresses some of the issues involved in such a mapping. Of special interest are the identification of the mapping parameters and some current approaches for handling the various situations that require a mapping
Recommended from our members
Structure identification in relational data
This paper presents several investigations into the prospects for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries. We propose a general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability. Using this framework, we then explore if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology and, if the answer is positive, identifying the desired decomposition. Finally, we address the problem of expressing a given relation as a Horn theory and, if this is impossible, finding the best k-Horn approximation to the given relation. We show that both problems can be solved in time polynomial in the length of the data
Relational Algebra for In-Database Process Mining
The execution logs that are used for process mining in practice are often
obtained by querying an operational database and storing the result in a flat
file. Consequently, the data processing power of the database system cannot be
used anymore for this information, leading to constrained flexibility in the
definition of mining patterns and limited execution performance in mining large
logs. Enabling process mining directly on a database - instead of via
intermediate storage in a flat file - therefore provides additional flexibility
and efficiency. To help facilitate this ideal of in-database process mining,
this paper formally defines a database operator that extracts the 'directly
follows' relation from an operational database. This operator can both be used
to do in-database process mining and to flexibly evaluate process mining
related queries, such as: "which employee most frequently changes the 'amount'
attribute of a case from one task to the next". We define the operator using
the well-known relational algebra that forms the formal underpinning of
relational databases. We formally prove equivalence properties of the operator
that are useful for query optimization and present time-complexity properties
of the operator. By doing so this paper formally defines the necessary
relational algebraic elements of a 'directly follows' operator, which are
required for implementation of such an operator in a DBMS
Reasoning about Independence in Probabilistic Models of Relational Data
We extend the theory of d-separation to cases in which data instances are not
independent and identically distributed. We show that applying the rules of
d-separation directly to the structure of probabilistic models of relational
data inaccurately infers conditional independence. We introduce relational
d-separation, a theory for deriving conditional independence facts from
relational models. We provide a new representation, the abstract ground graph,
that enables a sound, complete, and computationally efficient method for
answering d-separation queries about relational models, and we present
empirical results that demonstrate effectiveness.Comment: 61 pages, substantial revisions to formalisms, theory, and related
wor
Datalog and Constraint Satisfaction with Infinite Templates
On finite structures, there is a well-known connection between the expressive
power of Datalog, finite variable logics, the existential pebble game, and
bounded hypertree duality. We study this connection for infinite structures.
This has applications for constraint satisfaction with infinite templates. If
the template Gamma is omega-categorical, we present various equivalent
characterizations of those Gamma such that the constraint satisfaction problem
(CSP) for Gamma can be solved by a Datalog program. We also show that
CSP(Gamma) can be solved in polynomial time for arbitrary omega-categorical
structures Gamma if the input is restricted to instances of bounded treewidth.
Finally, we characterize those omega-categorical templates whose CSP has
Datalog width 1, and those whose CSP has strict Datalog width k.Comment: 28 pages. This is an extended long version of a conference paper that
appeared at STACS'06. In the third version in the arxiv we have revised the
presentation again and added a section that relates our results to
formalizations of CSPs using relation algebra
- …