1,519 research outputs found
Query-Answer Causality in Databases: Abductive Diagnosis and View-Updates
Causality has been recently introduced in databases, to model, characterize
and possibly compute causes for query results (answers). Connections between
query causality and consistency-based diagnosis and database repairs (wrt.
integrity constrain violations) have been established in the literature. In
this work we establish connections between query causality and abductive
diagnosis and the view-update problem. The unveiled relationships allow us to
obtain new complexity results for query causality -the main focus of our work-
and also for the two other areas.Comment: To appear in Proc. UAI Causal Inference Workshop, 2015. One example
was fixe
Introducing Dynamic Behavior in Amalgamated Knowledge Bases
The problem of integrating knowledge from multiple and heterogeneous sources
is a fundamental issue in current information systems. In order to cope with
this problem, the concept of mediator has been introduced as a software
component providing intermediate services, linking data resources and
application programs, and making transparent the heterogeneity of the
underlying systems. In designing a mediator architecture, we believe that an
important aspect is the definition of a formal framework by which one is able
to model integration according to a declarative style. To this purpose, the use
of a logical approach seems very promising. Another important aspect is the
ability to model both static integration aspects, concerning query execution,
and dynamic ones, concerning data updates and their propagation among the
various data sources. Unfortunately, as far as we know, no formal proposals for
logically modeling mediator architectures both from a static and dynamic point
of view have already been developed. In this paper, we extend the framework for
amalgamated knowledge bases, presented by Subrahmanian, to deal with dynamic
aspects. The language we propose is based on the Active U-Datalog language, and
extends it with annotated logic and amalgamation concepts. We model the sources
of information and the mediator (also called supervisor) as Active U-Datalog
deductive databases, thus modeling queries, transactions, and active rules,
interpreted according to the PARK semantics. By using active rules, the system
can efficiently perform update propagation among different databases. The
result is a logical environment, integrating active and deductive rules, to
perform queries and update propagation in an heterogeneous mediated framework.Comment: Other Keywords: Deductive databases; Heterogeneous databases; Active
rules; Update
Provenance in Collaborative Data Sharing
This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases.
To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques
Provenance for Aggregate Queries
We study in this paper provenance information for queries with aggregation.
Provenance information was studied in the context of various query languages
that do not allow for aggregation, and recent work has suggested to capture
provenance by annotating the different database tuples with elements of a
commutative semiring and propagating the annotations through query evaluation.
We show that aggregate queries pose novel challenges rendering this approach
inapplicable. Consequently, we propose a new approach, where we annotate with
provenance information not just tuples but also the individual values within
tuples, using provenance to describe the values computation. We realize this
approach in a concrete construction, first for "simple" queries where the
aggregation operator is the last one applied, and then for arbitrary (positive)
relational algebra queries with aggregation; the latter queries are shown to be
more challenging in this context. Finally, we use aggregation to encode queries
with difference, and study the semantics obtained for such queries on
provenance annotated databases
bdbms -- A Database Management System for Biological Data
Biologists are increasingly using databases for storing and managing their
data. Biological databases typically consist of a mixture of raw data,
metadata, sequences, annotations, and related data obtained from various
sources. Current database technology lacks several functionalities that are
needed by biological databases. In this paper, we introduce bdbms, an
extensible prototype database management system for supporting biological data.
bdbms extends the functionalities of current DBMSs to include: (1) Annotation
and provenance management including storage, indexing, manipulation, and
querying of annotation and provenance as first class objects in bdbms, (2)
Local dependency tracking to track the dependencies and derivations among data
items, (3) Update authorization to support data curation via content-based
authorization, in contrast to identity-based authorization, and (4) New access
methods and their supporting operators that support pattern matching on various
types of compressed biological data types. This paper presents the design of
bdbms along with the techniques proposed to support these functionalities
including an extension to SQL. We also outline some open issues in building
bdbms.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
Algebraic incremental maintenance of XML views
International audienceMaterialized views can bring important performance benefits when querying XML documents. In the presence of XML document changes, materialized views need to be updated to faithfully reflect the changed document. In this work, we present an algebraic approach for propagating source updates to XML materialized views expressed in a powerful XML tree pattern formalism. Our approach differs from the state of the art in the area in two important ways. First, it relies on set-oriented, algebraic operations, to be contrasted with node-based previous approaches. Second, it exploits state-of-the-art features of XML stores and XML query evaluation engines, notably XML structural identifiers and associated structural join algorithms. We present algorithms for determining how updates should be propagated to views, and highlight the benefits of our approach over existing algorithms through a series of experiments
Semi-Automatische Deduktion von Feature-Lokalisierung während der Softwareentwicklung: Masterarbeit
Despite extensive research on software product lines in the last decades, ad-hoc clone-and-own development is still the dominant way for introducing variability to software systems. Therefore, the same issues for which software product lines were developed in the first place are still imminent in clone-and-own development: Fixing bugs consistently throughout clones and avoiding duplicate implementation effort is extremely diffcult as similarities and differences between variants are unknown.
In order to remedy this, we enhance clone-and-own development with techniques from product-line engineering for targeted variant synchronisation such that domain knowledge can be integrated stepwise and without obligation. Contrary to retroactive feature mapping recovery (e.g., mining) techniques, we infer feature-to-code mappings directly during software development when concrete domain knowledge is present.
In this thesis, we focus on the first step towards targeted synchronisation between variants: the recording of feature mappings. By letting developers specify on which feature they are working on, we derive feature mappings directly during software development. We ensure syntactic validity of feature mappings and variant synchronisation by implementing disciplined annotations through abstract syntax trees. To bridge the mismatch between change classification in the implementation and abstract layer, we synthesise semantic edits on abstract syntax trees. We show that our derivation can be used to reproduce variability-related real-world code changes and compare it to the feature mapping derivation of the projectional variation control system VTS by Stanciulescu et al.Trotz umfangreicher Forschung zu Software-Produktlinien in den letzten Jahrzehnten ist Clone-and-Own immer noch der dominierende Ansatz zur Einführung von Variabilität in Softwaresystemen. Daher stehen bei Clone-and-Own immer noch die gleichen Probleme im Vordergrund, für die Software-Produktlinien überhaupt erst entwickelt wurden: Die konsistente Behebung von Fehlern in allen Klonen und die Vermeidung von doppeltem Implementierungsaufwand sind äußerst schwierig, da Ähnlichkeiten und Unterschiede zwischen den Varianten unbekannt sind.
Um hier Abhilfe zu schaffen, erweitern wir die Clone-and-Own-Entwicklung mit Techniken aus der Produktlinien-Entwicklung zur gezielten Synchronisierung von Varianten, sodass Entwickler ihr Domänenwissen schrittweise und unverbindlich integrieren können. Im Gegensatz zu nachträglich arbeitenden Feature-Mapping-Recovery- oder auch Mining-Techniken, leiten wir Zuordungen von Features zu Quellcode direkt während der Softwareentwicklung ab, wenn konkretes Domänenwissen vorhanden ist.
In dieser Arbeit entwickeln wir den ersten Schritt zur gezielten Synchronisation von Varianten: die Aufzeichnung von Feature-Mappings. Indem Entwickler spezifizieren an welchem Feature sie arbeiten, leiten wir Feature-Mappings direkt während der Softwareentwicklung ab. Wir stellen die syntaktische Korrektheit von Feature-Mappings und der Synchronisation von Varianten sicher, indem wir disziplinierte Annotationen mithilfe von abstrakten Syntaxbäumen implementieren. Um die Diskrepanz der Klassifizierung von Änderungen zwischen der Implementierungs- und der Abstraktionsschicht zu überbrücken, synthetisieren wir Semantic Edits auf abstrakten Syntaxbäumen. Wir zeigen, dass unsere Ableitung von Feature-Mappings in der Lage ist reale Codeänderungen zu reproduzieren und vergleichen sie mit der Feature-Mapping-Ableitung des Variationskontrollsystems VTS von Stanciulescu et al
- …