Search CORE

8,875 research outputs found

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

Incremental View Maintenance For Collection Programming

Author: Ceri Stefano
den Bussche Jan Van
Dimitrova Katica
Foster J. Nathan
Gupta Ashish
Johnson David S.
Kazem Lellahi S.
Liu Jixue
Suciu Dan
Zaharia Matei
Zeume Thomas
Publication venue
Publication date: 11/04/2016
Field of study

In the context of incremental view maintenance (IVM), delta query derivation is an essential technique for speeding up the processing of large, dynamic datasets. The goal is to generate delta queries that, given a small change in the input, can update the materialized view more efficiently than via recomputation. In this work we propose the first solution for the efficient incrementalization of positive nested relational calculus (NRC+) on bags (with integer multiplicities). More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is within the complexity class NC0 and we showcase how recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Solving equations in the relational algebra

Author: Biskup Joachim
Bussche Jan Van den
Paredaens Jan
Schwentick Thomas
Publication venue
Publication date: 10/12/2003
Field of study

Enumerating all solutions of a relational algebra equation is a natural and powerful operation which, when added as a query language primitive to the nested relational algebra, yields a query language for nested relational databases, equivalent to the well-known powerset algebra. We study \emph{sparse} equations, which are equations with at most polynomially many solutions. We look at their complexity, and compare their expressive power with that of similar notions in the powerset algebra.Comment: Minor revision, accepted for publication in SIAM Journal on Computin

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 01/01/2013
Field of study

Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California