Search CORE

523 research outputs found

Provenance in Collaborative Data Sharing

Author: Karvounarakis Grigoris
Publication venue: ScholarlyCommons
Publication date: 01/07/2009
Field of study

This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases. To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques

ScholarlyCommons@Penn

Asynchronous replication of eventually consistent updatable views

Author: Thomassen Joachim
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 01/06/2023
Field of study

Users of software applications expect fast response times and high availability. This is despite several applications moving from local devices and into the cloud. A cloud-based application that could function locally will now be unavailable if a network partition occurs. A fundamental challenge in distributed systems is maintaining the right tradeoffs between strong consistency, high availability, and tolerance to network partitions. The impossibility of achieving all three properties is described in the CAP theorem. To guarantee the highest degree of responsiveness and availability, applications could be run entirely locally on a device without directly relying on cloud services. Software that can be run locally without a direct dependency on cloud services are called local-first software. Being local-first means that consistency guarantees may need to be relaxed. Weaker consistency, such as eventual consistency, can be used instead of strong consistency. Implementing conflict-free replicated data types (CRDTs) is a provably correct way to achieve eventual consistency. These data types guarantee that the state of different replicas will converge towards a common state when a system becomes connected and quiescent. The drawback of using CRDTs is that they are unbounded in their growth. This means they can quickly become too large to handle using less capable devices like smartphones, tablets, or other edge devices. To mitigate this, partial replication can be implemented to replicate only the data each device needs. This comes with the added benefit of limiting the information users obtain, thus possibly improving security and privacy. The main contribution of this thesis is a new approach to partial replication. It is based on an existing asynchronously replicated relational database to support local-first software and guarantees eventual consistency. The new approach uses database views to define partial replicas. The database views are made updatable by drawing inspiration from the large body of research on updatable views. We differentiate ourselves from earlier work on non-distributed updatable views by guaranteeing that the views are eventually consistent. The approach is evaluated to ensure it can be used for real scenarios. The approach has proved to be usable in the scenarios. The replication of database views has also been experimentally tested to ensure that our approach to partial replication is viable for less capable devices

Munin - Open Research Archive

The {RDF}-3X Engine for Scalable Management of {RDF} Data

Author: Neumann T.
Weikum G.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2009
Field of study

RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The ``pay-as-you-go'' nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing. The physical design is identical for all RDF-3X databases regardless of their workloads, and completely eliminates the need for index tuning by exhaustive indexes for all permutations of subject-property-object triples and their binary and unary projections. These indexes are highly compressed, and the query processor can aggressively leverage fast merge joins with excellent performance of processor caches. The query optimizer is able to choose optimal join orders even for complex queries, with a cost model that includes statistical synopses for entire join paths. Although RDF-3X is optimized for queries, it also provides good support for efficient online updates by means of a staging architecture: direct updates to the main database indexes are deferred, and instead applied to compact differential indexes which are later merged into the main indexes in a batched manner. Experimental studies with several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching, manyway star-joins, and long path-joins demonstrate that RDF-3X can outperform the previously best alternatives by one or two orders of magnitude

MPG.PuRe

Recommended from our members

Abbreviated Query Interpretation in Extended Entity-Relationship Oriented Databases

Author: Markowitz V M
Shoshani A
Publication venue: eScholarship, University of California
Publication date: 01/08/1989
Field of study

eScholarship - University of California

Recommended from our members

U-Filter: A Lightweight XML View Update Checker

Author: Mani Murali
Rundensteiner Elke A.
Wang Ling
Publication venue: Digital WPI
Publication date: 01/07/2005
Field of study

We study in this paper the problem of whether a correct relational update translation can be found for a given update over an XML view. For this, we propose a lightweight update checking framework named U-Filter. It first performs two steps of schemalevel (and thus very inexpensive) checks based on a view definition analysis. Only when necessary, a third checking step, requiring base data access and thus more expensive, is employed. For the latter, we design an internal strategy as well as an external strategy (with respect to the DBMS). This three-step checking process is guaranteed to filter out all XML updates that cannot be translated. Finally, the remaining updates are fed to the update translation engine, which generates the corresponding SQL update statements. Our experiments illustrate the usefulness of U-Filter and the performance impact achievable by the proposed algorithm

Digital WPI

CiteSeerX

Updating Recursive XML Views of Relations

Author: Alberto Marchetti-Spaccamela
Byron Choi
Elias Koutsoupias
Gao Cong
Jens Lechtenborger
L Wang
Stavros S Cosmadakis
Stratis D. Viglas
Umeshwar Dayal
Wenfei Fan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This paper investigates the view update problem for XML views published from relational data. We consider XML views defined in terms of mappings directed by possibly recursive DTDs, compressed into DAGs and stored in relations. We provide new techniques to efficiently support XML view updates specified in terms of XPath expressions with recursion and complex filters. The interaction between XPath recursion and DAG compression of XML views makes the analysis of XML view updates rather intriguing. In addition, many issues are still open even for relational view updates, and need to be explored. In response to these, on the XML side, we revise the notion of side effects and update semantics based on the semantics of XML views, and present efficient algorithms to translate XML updates to relational view updates. On the relational side, we propose a mild condition on SPJ views, and show that under this condition the analysis of deletions on relational views becomes PTIME while the insertion analysis is NP-complete. We develop an efficient algorithm to process relational view deletions, and a heuristic algorithm to handle view insertions. Finally, we present an experimental study to verify the effectiveness of our techniques. 1

CiteSeerX

Crossref

Edinburgh Research Explorer

VBN