206,177 research outputs found
Incremental elasticity for array databases
Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skew-aware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.Intel Corporation (Science and Technology Center for Big Data
Flexible and efficient IR using array databases
textabstractThe Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage
Flexible and efficient IR using array databases
The Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage
Chunked extendible arrays and its integration with the global array toolkit for parallel image processing
A thesis submitted to the Faculty of Engineering and the Built Environment
in fulfilment of the requirements for the degree of
Doctor of Philosophy, 2016Online resource (xii, 151 leaves)Several meetings of the Extremely Large Databases Community for large scale
scientific applications have advocated the use of multidimensional arrays as the
appropriate model for representing scientific databases. Scientific databases gradually
grow to massive sizes of the order of terabytes and petabytes. As such, the storage of
such databases requires efficient dynamic storage schemes where the array is allowed
to arbitrarily extend the bounds of the dimensions. Conventional multidimensional
array representations in today’s programming environments do not extend or shrink
their bounds without relocating elements of the data-set. In general extendibility of
the bounds of the dimensions is limited to only one dimension. This thesis presents a
technique for storing dense multidimensional arrays by chunks such that the array can
be extended along any dimension without compromising the access time of an element.
This is done with a computed access mapping function that maps the k-dimensional
index onto a linear index of the storage locations. This concept forms the basis for
the implementation of an array file of any number of dimensions, where the bounds
of the array dimension can be extended arbitrarily. Such a feature currently exists in
the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a
dimension in the HDF5 array file can be unusually expensive in time. Such extensions,
in our storage scheme for dense array files, can be performed while still accessing
elements of the array at orders of magnitude faster than in HDF5 or conventional
array-files. We also present Parallel Chunked Extendible Dense Array (PEXTA), a
new parallel I/O model for the Global Array Toolkit. PEXTA provides the necessary
Application Programming Interface (API) for explicit data transfer between the
memory resident global array and its secondary storage counterpart but also allows
the persistent array to be extended on any dimension without compromising the
access time of an element or sub-array elements. Such APIs provide a platform
for high speed and parallel hyperspectral image processing without performance
degradation, even when the imagery files undergo extensions.MT201
- …