17 research outputs found
Write-limited sorts and joins for persistent memory
To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byteaddressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems. We do so in the context of query processing. We focus on the fundamental operations of sort and join processing. We introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. We present four different techniques to incorporate persistent memory into the database processing stack in light of this API. We have implemented and extensively evaluated all our proposals. Our results show that the algorithms deliver on their promise of I/O-minimality and tunable performance. We showcase the merits and deficiencies of each implementation technique, thus taking a solid first step towards incorporating persistent memory into query processing. 1
Efficient Compute Node-Local Replication Mechanisms for NVRAM-Centric Data Structures
Non-volatile random-access memory (NVRAM) is about to hit the market and will require significant changes to the architecture of in-memory database systems. Since such hybrid DRAM-NVRAM database systems will keep the primary data solely persistent in the NVRAM, efficient replication mechanisms need to be considered to prevent data losses and to guarantee high availability in case of NVDIMM failures. In this paper, we argue for a software-based replication approach and present compute node-local mechanisms to provide the building blocks for an efficient NVRAM replication with a low latency and throughput penalty. Within our evaluation, we measured up to 10x less overhead for our optimized replication mechanisms compared to the basic replication mechanism of the Intel persistent memory development kit (PMDK)
Implicit Decomposition for Write-Efficient Connectivity Algorithms
The future of main memory appears to lie in the direction of new technologies
that provide strong capacity-to-performance ratios, but have write operations
that are much more expensive than reads in terms of latency, bandwidth, and
energy. Motivated by this trend, we propose sequential and parallel algorithms
to solve graph connectivity problems using significantly fewer writes than
conventional algorithms. Our primary algorithmic tool is the construction of an
-sized "implicit decomposition" of a bounded-degree graph on
nodes, which combined with read-only access to enables fast answers to
connectivity and biconnectivity queries on . The construction breaks the
linear-write "barrier", resulting in costs that are asymptotically lower than
conventional algorithms while adding only a modest cost to querying time. For
general non-sparse graphs on edges, we also provide the first writes
and operations parallel algorithms for connectivity and biconnectivity.
These algorithms provide insight into how applications can efficiently process
computations on large graphs in systems with read-write asymmetry
Integer Compression in NVRAM-centric Data Stores: Comparative Experimental Analysis to DRAM
Lightweight integer compression algorithms play an important role in in-memory database systems to tackle the growing gap between processor speed and main memory bandwidth. Thus, there is a large number of algorithms to choose from, while different algorithms are tailored to different data characteristics. As we show in this paper, with the availability of byte-addressable non-volatile random-access memory (NVRAM), a novel type of main memory with specific characteristics increases the overall complexity in this domain. In particular, we provide a detailed evaluation of state-of-the-art lightweight integer compression schemes and database operations on NVRAM and compare it with DRAM. Furthermore, we reason about possible deployments of middle- and heavyweight approaches for better adaptation to NVRAM characteristics. Finally, we investigate a combined approach where both volatile and non-volatile memories are used in a cooperative fashion that is likely to be the case for hybrid and NVRAM-centric database systems
Adaptive Merging on Phase Change Memory
Indexing is a well-known database technique used to facilitate data access
and speed up query processing. Nevertheless, the construction and modification
of indexes are very expensive. In traditional approaches, all records in the
database table are equally covered by the index. It is not effective, since
some records may be queried very often and some never. To avoid this problem,
adaptive merging has been introduced. The key idea is to create index
adaptively and incrementally as a side-product of query processing. As a
result, the database table is indexed partially depending on the query
workload. This paper faces a problem of adaptive merging for phase change
memory (PCM). The most important features of this memory type are: limited
write endurance and high write latency. As a consequence, adaptive merging
should be investigated from the scratch. We solve this problem in two steps.
First, we apply several PCM optimization techniques to the traditional adaptive
merging approach. We prove that the proposed method (eAM) outperforms a
traditional approach by 60%. After that, we invent the framework for adaptive
merging (PAM) and a new PCM-optimized index. It further improves the system
performance by 20% for databases where search queries interleave with data
modifications
Efficient Algorithms with Asymmetric Read and Write Costs
In several emerging technologies for computer memory (main memory), the cost of reading is significantly cheaper than the cost of writing. Such asymmetry in memory costs poses a fundamentally different model from the RAM for algorithm design. In this paper we study lower and upper bounds for various problems under such asymmetric read and write costs. We consider both the case in which all but O(1) memory has asymmetric cost, and the case of a small cache of symmetric memory. We model both cases using the (M,omega)-ARAM, in which there is a small (symmetric) memory of size M and a large unbounded (asymmetric) memory, both random access, and where reading from the large memory has unit cost, but writing has cost omega >> 1.
For FFT and sorting networks we show a lower bound cost of Omega(omega*n*log_{omega*M}(n)), which indicates that it is not possible to achieve asymptotic improvements with cheaper reads when omega is bounded by a polynomial in M. Moreover, there is an asymptotic gap (of min(omega,log(n)/log(omega*M)) between the cost of sorting networks and comparison sorting in the model. This contrasts with the RAM, and most other models, in which the asymptotic costs are the same. We also show a lower bound for computations on an n*n diamond DAG of Omega(omega*n^2/M) cost, which indicates no asymptotic improvement is achievable with fast reads. However, we show that for the minimum edit distance problem (and related problems), which would seem to be a diamond DAG, we can beat this lower bound with an algorithm with only O(omega*n^2/(M*min(omega^{1/3},M^{1/2}))) cost. To achieve this we make use of a "path sketch" technique that is forbidden in a strict DAG computation. Finally, we show several interesting upper bounds for shortest path problems, minimum spanning trees, and other problems. A common theme in many of the upper bounds is that they require redundant computation and a tradeoff between reads and writes