28,447 research outputs found
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
Improved Bounds and Schemes for the Declustering Problem
The declustering problem is to allocate given data on parallel working
storage devices in such a manner that typical requests find their data evenly
distributed on the devices. Using deep results from discrepancy theory, we
improve previous work of several authors concerning range queries to
higher-dimensional data. We give a declustering scheme with an additive error
of independent of the data size, where is the
dimension, the number of storage devices and does not exceed the
smallest prime power in the canonical decomposition of into prime powers.
In particular, our schemes work for arbitrary in dimensions two and three.
For general , they work for all that are powers of two.
Concerning lower bounds, we show that a recent proof of a
bound contains an error. We close the gap in
the proof and thus establish the bound.Comment: 19 pages, 1 figur
- …