Search CORE

204 research outputs found

Incremental View Maintenance For Collection Programming

Author: Ceri Stefano
den Bussche Jan Van
Dimitrova Katica
Foster J. Nathan
Gupta Ashish
Johnson David S.
Kazem Lellahi S.
Liu Jixue
Suciu Dan
Zaharia Matei
Zeume Thomas
Publication venue
Publication date: 11/04/2016
Field of study

In the context of incremental view maintenance (IVM), delta query derivation is an essential technique for speeding up the processing of large, dynamic datasets. The goal is to generate delta queries that, given a small change in the input, can update the materialized view more efficiently than via recomputation. In this work we propose the first solution for the efficient incrementalization of positive nested relational calculus (NRC+) on bags (with integer multiplicities). More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is within the complexity class NC0 and we showcase how recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Derivation of incremental equations for PNF nested relations

Author: Liu Jixue
Vincent Millist
Publication venue
Publication date: 01/01/2003
Field of study

Incremental view maintenance techniques are required for many new types of data models that are being increasingly used in industry. One of these models is the nested relational model that is used in the modelling complex objects in databases. In this paper we derive a group of expressions for incrementally evaluating query expressions in the nested relational model. We also present an algorithm to propagate base relation updates to a materialized view when the view is defined as a complex query

University of Szeged

Efficient incremental view maintenance in data warehouses

Author: Jin Hyun Son
Ki Yong Lee
Myoung Ho Kim
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Real-time extensions to a relational database

Author: DiPesa Anthony P. (Anthony Paul)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1987
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1987.MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING.Bibliography: leaf 63.by Anthony P. DiPesa, Jr.M.S

DSpace@MIT

Acta Cybernetica : Volume 16. Number 1.

Author
Publication venue
Publication date: 01/01/2003
Field of study

University of Szeged

PrologPF: Parallel Logic and Functions on the Delphi Machine

Author: Lewis Ian
Publication venue: University of Cambridge
Publication date: 01/01/1998
Field of study

PrologPF is a parallelising compiler targeting a distributed system of general purpose workstations connected by a relatively low performance network. The source language extends standard Prolog with the integration of higher-order functions. The execution of a compiled PrologPF program proceeds in a similar manner to standard Prolog, but uses oracles in one of two modes. An oracle represents the sequence of clauses used to reach a given point in the problem search tree, and the same PrologPF executable can be used to build oracles, or follow oracles previously generated. The parallelisation strategy used by PrologPF proceeds in two phases, which this research shows can be interleaved. An initial phase searches the problem tree to a limited depth, recording the discovered incomplete paths. In the second phase these paths are allocated to the available processors in the network. Each processor follows its assigned paths and fully searches the referenced subtree, sending solutions back to a control processor. This research investigates the use of the technique with a one-time partitioning of the problem and no further scheduling communication, and with the recursive application of the partitioning technique to effect dynamic work reassignment. For a problem requiring all solutions to be found, execution completes when all the distributed processors have completed the search of their assigned subtrees. If one solution is required, the execution of all the path processors is terminated when the control processor receives the first solution. The presence of the extra-logical Prolog predicate cut in the user program conflicts with the use of oracles to represent valid open subtrees. PrologPF promotes the use of higher-order functional programming as an alternative to the use of cut. The combined language shows that functional support can be added as a consistent extension to standard Prolog

CiteSeerX

Apollo (Cambridge)

Integration of Heterogeneous Databases: Discovery of Meta-Information and Maintenance of Schema-Restructuring Views

Author: Koeller Andreas
Publication venue: Digital WPI
Publication date: 15/04/2002
Field of study

In today\u27s networked world, information is widely distributed across many independent databases in heterogeneous formats. Integrating such information is a difficult task and has been adressed by several projects. However, previous integration solutions, such as the EVE-Project, have several shortcomings. Database contents and structure change frequently, and users often have incomplete information about the data content and structure of the databases they use. When information from several such insufficiently described sources is to be extracted and integrated, two problems have to be solved: How can we discover the structure and contents of and interrelationships among unknown databases, and how can we provide durable integration views over several such databases? In this dissertation, we have developed solutions for those key problems in information integration. The first part of the dissertation addresses the fact that knowledge about the interrelationships between databases is essential for any attempt at solving the information integration problem. We are presenting an algorithm called FIND2 based on the clique-finding problem in graphs and k-uniform hypergraphs to discover redundancy relationships between two relations. Furthermore, the algorithm is enhanced by heuristics that significantly reduce the search space when necessary. Extensive experimental studies on the algorithm both with and without heuristics illustrate its effectiveness on a variety of real-world data sets. The second part of the dissertation addresses the durable view problem and presents the first algorithm for incremental view maintenance in schema-restructuring views. Such views are essential for the integration of heterogeneous databases. They are typically defined in schema-restructuring query languages like SchemaSQL, which can transform schema into data and vice versa, making traditional view maintenance based on differential queries impossible. Based on an existing algebra for SchemaSQL, we present an update propagation algorithm that propagates updates along the query algebra tree and prove its correctness. We also propose optimizations on our algorithm and present experimental results showing its benefits over view recomputation

DigitalCommons@WPI

Compilation Techniques for Incremental Collection Processing

Author: Lupei Daniel
Publication venue: Lausanne, EPFL
Publication date: 27/10/2017
Field of study

Many map-reduce frameworks as well as NoSQL systems rely on collection programming as their interface of choice due to its rich semantics along with an easily parallelizable set of primitives. Unfortunately, the potential of collection programming is not entirely fulfilled by current systems as they lack efficient incremental view maintenance (IVM) techniques for queries producing large nested results. This comes as a consequence of the fact that the nesting of collections does not enjoy the same algebraic properties underscoring the optimization potential of typical collection processing constructs. We propose the first solution for the efficient incrementalization of collection programming in terms of its core constructs as captured by the positive nested relational calculus (NRC+) on bags (with integer multiplicities). We take an approach based on delta query derivation, whose goal is to generate delta queries which, given a small change in the input, can update the materialized view more efficiently than via recomputation. More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+, a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incrementalmaintenance for NRC+ is within the complexity class NC0 and we showcase how Recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries, can also be applied to IncNRC+ . Existing systems are also limited wrt. the size of inner collections that they can effectively handle before running into severe performance bottlenecks. In particular, in the face of nested collections with skewed cardinalities developers typically have to undergo a painful process of manual query re-writes in order to ensure that the largest inner collections in their workloads are not impacted by these limitations. To address these issues we developed SLeNDer, a compilation framework that given a nested query generates a set of semantically equivalent (partially) shredded queries that can be efficiently evaluated and incrementalized using state of the art techniques for handling skew and applying delta changes, respectively. The derived queries expose nested collections to the same opportunities for distributing their processing and incrementally updating their contents as those enjoyed by top-level collections, leading on our benchmark to up to 16.8x and 21.9x speedups in terms of offline and online processing, respectively. In order to enable efficient IVM for the increasingly common case of collection programming with functional values as in Links, we also discuss the efficient incrementalization of simplytyped lambda calculi, under the constraint that their primitives are themselves efficiently incrementalizable

Infoscience - École polytechnique fédérale de Lausanne