397 research outputs found
Ronciling Differences
In this paper we study a problem motivated by the management of changes in databases. It turns out that several such change scenarios, e.g., the separately studied problems of view maintenance (propagation of data changes) and view adaptation (propagation of view definition changes) can be unified as instances of query reformulation using views provided that support for the relational difference operator exists in the context of query reformulation. Exact query reformulation using views in positive relational languages is well understood, and has a variety of applications in query optimization and data sharing. Unfortunately, most questions about queries become undecidable in the presence of difference (or negation), whether we use the foundational set semantics or the more practical bag semantics. We present a new way of managing this difficulty by defining a novel semantics, Z- relations, where tuples are annotated with positive or negative integers. Z-relations conveniently represent data, insertions, and deletions in a uniform way, and can apply deletions with the union operator (deletions are tuples with negative counts). We show that under Z-semantics relational algebra (R A) queries have a normal form consisting of a single difference of positive queries, and this leads to the decidability of their equivalence.We provide a sound and complete algorithm for reformulating R A queries, including queries with difference, over Z-relations. Additionally, we show how to support standard view maintenanc
Taking I/O seriously: resolution reconsidered for disk
Journal ArticleModern compilation techniques can give Prolog programs, in the best cases, a speed comparable to C. However, Prolog has proven to be unacceptable for data-oriented queries for two major reasons: its poor termination and complexity properties for Datalog, and its tuple-at-a-time strategy. A number of tabling frameworks and systems have addressed the first problem, including the XSB system which has achieved Prolog speeds for tabled programs. Yet tabling systems such as XSB continue to use the tuple-at-a-time paradigm. As a result, these systems are not amenable to a tight interconnection with disk-resident data. However, in a tabling framework the difference between tuple-at-a-time behavior and set-at-a-time can be viewed as one of scheduling. Accordingly, we define a breadth-first set-at-a-time tabling strategy and prove it iteration equivalent to a form of semi-naive magic evaluation. That is, we extend the well-known asymptotic results of Seki [10] by proving that each iteration of the tabling strategy produces the same information as semi-naive magic. Further, this set-at-a-time scheduling is amenable to implementation in an engine that uses Prolog compilation. We describe both the engine and its performance, which is comparable with the tuple-at-a-time strategy even for in-memory Datalog queries. Because of its performance and its fine level of integration of Prolog with a database-style search, the set-at-a-time engine appears as an important key to linking logic programming and deductive databases
Structural Logistic Regression for Link Analysis
We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteria for inclusion in a logistic regression. Using statistics and relational representation allows modeling in noisy domains with complex structure. Link prediction is a task of high interest with exactly such characteristics. Be it in the domain of scientific citations, social networks or hypertext, the underlying data are extremely noisy and the features useful for prediction are not readily available in a flat file format. We propose the application of Structural Logistic Regression to building link prediction models, and present experimental results for the task of predicting citations made in scientific literature using relational data taken from the CiteSeer search engine. This data includes the citation graph, authorship and publication venues of papers, as well as their word content
Towards an Efficient Evaluation of General Queries
Database applications often require to
evaluate queries containing quantifiers or disjunctions,
e.g., for handling general integrity constraints. Existing
efficient methods for processing quantifiers depart from the
relational model as they rely on non-algebraic procedures.
Looking at quantified query evaluation from a new angle,
we propose an approach to process quantifiers that makes
use of relational algebra operators only. Our approach
performs in two phases. The first phase normalizes the
queries producing a canonical form. This form permits to
improve the translation into relational algebra performed
during the second phase. The improved translation relies
on a new operator - the complement-join - that generalizes
the set difference, on algebraic expressions of universal
quantifiers that avoid the expensive division operator in
many cases, and on a special processing of disjunctions by
means of constrained outer-joins. Our method achieves an
efficiency at least comparable with that of previous
proposals, better in most cases. Furthermore, it is considerably
simpler to implement as it completely relies on
relational data structures and operators
- …