3,016 research outputs found
Persistent Memory Programming Abstractions in Context of Concurrent Applications
The advent of non-volatile memory (NVM) technologies like PCM, STT,
memristors and Fe-RAM is believed to enhance the system performance by getting
rid of the traditional memory hierarchy by reducing the gap between memory and
storage. This memory technology is considered to have the performance like that
of DRAM and persistence like that of disks. Thus, it would also provide
significant performance benefits for big data applications by allowing
in-memory processing of large data with the lowest latency to persistence.
Leveraging the performance benefits of this memory-centric computing technology
through traditional memory programming is not trivial and the challenges
aggravate for parallel/concurrent applications. To this end, several
programming abstractions have been proposed like NVthreads, Mnemosyne and
intel's NVML. However, deciding upon a programming abstraction which is easier
to program and at the same time ensures the consistency and balances various
software and architectural trade-offs is openly debatable and active area of
research for NVM community.
We study the NVthreads, Mnemosyne and NVML libraries by building a concurrent
and persistent set and open addressed hash-table data structure application. In
this process, we explore and report various tradeoffs and hidden costs involved
in building concurrent applications for persistence in terms of achieving
efficiency, consistency and ease of programming with these NVM programming
abstractions. Eventually, we evaluate the performance of the set and hash-table
data structure applications. We observe that NVML is easiest to program with
but is least efficient and Mnemosyne is most performance friendly but involves
significant programming efforts to build concurrent and persistent
applications.Comment: Accepted in HiPC SRS 201
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
A Template for Implementing Fast Lock-free Trees Using HTM
Algorithms that use hardware transactional memory (HTM) must provide a
software-only fallback path to guarantee progress. The design of the fallback
path can have a profound impact on performance. If the fallback path is allowed
to run concurrently with hardware transactions, then hardware transactions must
be instrumented, adding significant overhead. Otherwise, hardware transactions
must wait for any processes on the fallback path, causing concurrency
bottlenecks, or move to the fallback path. We introduce an approach that
combines the best of both worlds. The key idea is to use three execution paths:
an HTM fast path, an HTM middle path, and a software fallback path, such that
the middle path can run concurrently with each of the other two. The fast path
and fallback path do not run concurrently, so the fast path incurs no
instrumentation overhead. Furthermore, fast path transactions can move to the
middle path instead of waiting or moving to the software path. We demonstrate
our approach by producing an accelerated version of the tree update template of
Brown et al., which can be used to implement fast lock-free data structures
based on down-trees. We used the accelerated template to implement two
lock-free trees: a binary search tree (BST), and an (a,b)-tree (a
generalization of a B-tree). Experiments show that, with 72 concurrent
processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many
operations per second as an implementation obtained using the original tree
update template
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
- …