119,369 research outputs found
A Template for Implementing Fast Lock-free Trees Using HTM
Algorithms that use hardware transactional memory (HTM) must provide a
software-only fallback path to guarantee progress. The design of the fallback
path can have a profound impact on performance. If the fallback path is allowed
to run concurrently with hardware transactions, then hardware transactions must
be instrumented, adding significant overhead. Otherwise, hardware transactions
must wait for any processes on the fallback path, causing concurrency
bottlenecks, or move to the fallback path. We introduce an approach that
combines the best of both worlds. The key idea is to use three execution paths:
an HTM fast path, an HTM middle path, and a software fallback path, such that
the middle path can run concurrently with each of the other two. The fast path
and fallback path do not run concurrently, so the fast path incurs no
instrumentation overhead. Furthermore, fast path transactions can move to the
middle path instead of waiting or moving to the software path. We demonstrate
our approach by producing an accelerated version of the tree update template of
Brown et al., which can be used to implement fast lock-free data structures
based on down-trees. We used the accelerated template to implement two
lock-free trees: a binary search tree (BST), and an (a,b)-tree (a
generalization of a B-tree). Experiments show that, with 72 concurrent
processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many
operations per second as an implementation obtained using the original tree
update template
On Benchmarking Embedded Linux Flash File Systems
Due to its attractive characteristics in terms of performance, weight and
power consumption, NAND flash memory became the main non volatile memory (NVM)
in embedded systems. Those NVMs also present some specific
characteristics/constraints: good but asymmetric I/O performance, limited
lifetime, write/erase granularity asymmetry, etc. Those peculiarities are
either managed in hardware for flash disks (SSDs, SD cards, USB sticks, etc.)
or in software for raw embedded flash chips. When managed in software, flash
algorithms and structures are implemented in a specific flash file system
(FFS). In this paper, we present a performance study of the most widely used
FFSs in embedded Linux: JFFS2, UBIFS,and YAFFS. We show some very particular
behaviors and large performance disparities for tested FFS operations such as
mounting, copying, and searching file trees, compression, etc.Comment: Embed With Linux, Lorient : France (2012
Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays
The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism
Universal Wait-Free Memory Reclamation
In this paper, we present a universal memory reclamation scheme, Wait-Free
Eras (WFE), for deleted memory blocks in wait-free concurrent data structures.
WFE's key innovation is that it is completely wait-free. Although some prior
techniques provide similar guarantees for certain data structures, they lack
support for arbitrary wait-free data structures. Consequently, developers are
typically forced to marry their wait-free data structures with lock-free Hazard
Pointers or (potentially blocking) epoch-based memory reclamation. Since both
these schemes provide weaker progress guarantees, they essentially forfeit the
strong progress guarantee of wait-free data structures. Though making the
original Hazard Pointers scheme or epoch-based reclamation completely wait-free
seems infeasible, we achieved this goal with a more recent, (lock-free) Hazard
Eras scheme, which we extend to guarantee wait-freedom. As this extension is
non-trivial, we discuss all challenges pertaining to the construction of
universal wait-free memory reclamation.
WFE is implementable on ubiquitous x86_64 and AArch64 (ARM) architectures.
Its API is mostly compatible with Hazard Pointers, which allows easy
transitioning of existing data structures into WFE. Our experimental
evaluations show that WFE's performance is close to epoch-based reclamation and
almost matches the original Hazard Eras scheme, while providing the stronger
wait-free progress guarantee
- …