3 research outputs found
Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID
Evaluating the performance of scientific data processing systems is a
difficult task considering the plethora of application-specific solutions
available in this landscape and the lack of a generally-accepted benchmark. The
dual structure of scientific data coupled with the complex nature of processing
complicate the evaluation procedure further. SS-DB is the first attempt to
define a general benchmark for complex scientific processing over raw and
derived data. It fails to draw sufficient attention though because of the
ambiguous plain language specification and the extraordinary SciDB results. In
this paper, we remedy the shortcomings of the original SS-DB specification by
providing a formal representation in terms of ArrayQL algebra operators and
ArrayQL/SciQL constructs. These are the first formal representations of the
SS-DB benchmark. Starting from the formal representation, we give a reference
implementation and present benchmark results in EXTASCID, a novel system for
scientific data processing. EXTASCID is complete in providing native support
both for array and relational data and extensible in executing any user code
inside the system by the means of a configurable metaoperator. These features
result in an order of magnitude improvement over SciDB at data loading,
extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure
High Throughput Push Based Storage Manager
The storage manager, as a key component of the database system, is
responsible for organizing, reading, and delivering data to the execution
engine for processing. According to the data serving mechanism, existing
storage managers are either pull-based, incurring high latency, or push-based,
leading to a high number of I/O requests when the CPU is busy. To improve these
shortcomings, this thesis proposes a push-based prefetching strategy in a
column-wise storage manager. The proposed strategy implements an efficient
cache layer to store shared data among queries to reduce the number of I/O
requests. The capacity of the cache is maintained by a time access-aware
eviction mechanism. Our strategy enables the storage manager to coordinate
multiple queries by merging their requests and dynamically generate an optimal
read order that maximizes the overall I/O throughput. We evaluated our storage
manager both over a disk-based redundant array of independent disks (RAID) and
an NVM Express (NVMe) solid-state drive (SSD). With the high read performance
of the SSD, we successfully minimized the total read time and number of I/O
accesses