14,568 research outputs found
Flashing up the storage hierarchy
The focus of this thesis is on systems that employ both flash and magnetic disks as
storage media. Considering the widely disparate I/O costs of flash disks currently on
the market, our approach is a cost-aware one: we explore techniques that exploit the
I/O costs of the underlying storage devices to improve I/O performance. We also study
the asymmetric I/O properties of magnetic and flash disks and propose algorithms that
take advantage of this asymmetry. Our work is geared towards database systems; however,
most of the ideas presented in this thesis can be generalised to any data-intensive
application.
For the case of low-end, inexpensive flash devices with large capacities, we propose
using them at the same level of the memory hierarchy as magnetic disks. In such
setups, we study the problem of data placement, that is, on which type of storage
medium each data page should be stored. We present a family of online algorithms that
can be used to dynamically decide the optimal placement of each page. Our algorithms
adapt to changing workloads for maximum I/O efficiency. We found that substantial
performance benefits can be gained with such a design, especially for queries touching
large sets of pages with read-intensive workloads.
Moving one level higher in the storage hierarchy, we study the problem of buffer
allocation in databases that store data across multiple storage devices. We present our
novel approach to per-device memory allocation, under which both the I/O costs of the
storage devices and the cache behaviour of the data stored on each medium determine
the size of the main memory buffers that will be allocated to each device. Towards
informed decisions, we found that the ability to predict the cache behaviour of devices
under various cache sizes is of paramount importance. In light of this, we study the
problem of efficiently tracking the hit ratio curve for each device and introduce a lowoverhead
technique that provides high accuracy.
The price and performance characteristics of high-end flash disks make them perfectly
suitable for use as caches between the main memory and the magnetic disk(s)
of a storage system. In this context, we primarily focus on the problem of deciding
which data should be placed in the flash cache of a system: how the data flows from
one level of the memory hierarchy to the others is crucial for the performance of such a
system. Considering such decisions, we found that the I/O costs of the flash cache play
a major role. We also study several implementation issues such as the optimal size of
flash pages and the properties of the page directory of a flash cache.
Finally, we explore sorting in external memory using external merge-sort, as the
latter employs access patterns that can take full advantage of the I/O characteristics of
flash memory. We study the problem of sorting hierarchical data, as such is necessary
for a wide variety of applications including archiving scientific data and dealing with
large XML datasets. The proposed algorithm efficiently exploits the hierarchical structure
in order to minimize the number of disk accesses and optimise the utilization of
available memory. Our proposals are not specific to sorting over flash memory: the
presented techniques are highly efficient over magnetic disks as well
On-Disk Data Processing: Issues and Future Directions
In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP,
which is a form of near-data processing, refers to the computing arrangement
where the secondary storage drives have the data processing capability.
Proposed ODDP schemes vary widely in terms of the data processing capability,
target applications, architecture and the kind of storage drive employed. Some
ODDP schemes provide only a specific but heavily used operation like sort
whereas some provide a full range of operations. Recently, with the advent of
Solid State Drives, powerful and extensive ODDP solutions have been proposed.
In this paper, we present a thorough review of architectures developed for
different on-disk processing approaches along with current and future
challenges and also identify the future directions which ODDP can take.Comment: 24 pages, 17 Figures, 3 Table
Write-limited sorts and joins for persistent memory
To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byteaddressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems. We do so in the context of query processing. We focus on the fundamental operations of sort and join processing. We introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. We present four different techniques to incorporate persistent memory into the database processing stack in light of this API. We have implemented and extensively evaluated all our proposals. Our results show that the algorithms deliver on their promise of I/O-minimality and tunable performance. We showcase the merits and deficiencies of each implementation technique, thus taking a solid first step towards incorporating persistent memory into query processing. 1
- …