55,862 research outputs found
Preemptive Software Transactional Memory
In state-of-the-art Software Transactional Memory (STM) systems, threads carry out the execution of transactions as non-interruptible tasks. Hence, a thread can react to the injection of a higher priority transactional task and take care of its processing only at the end of the currently executed transaction. In this article we pursue a paradigm shift where the execution of an in-memory transaction is carried out as a preemptable task, so that a thread can start processing a higher priority transactional task before finalizing its current transaction. We achieve this goal in an application-transparent manner, by only relying on Operating System facilities we include in our preemptive STM architecture. With our approach we are able to re-evaluate CPU assignment across transactions along a same thread every few tens of microseconds. This is mandatory for an effective priority-aware architecture given the typically finer-grain nature of in-memory transactions compared to their counterpart in database systems. We integrated our preemptive STM architecture with the TinySTM package, and released it as open source. We also provide the results of an experimental assessment of our proposal based on running a port of the TPC-C benchmark to the STM environment
Progressive Transactional Memory in Time and Space
Transactional memory (TM) allows concurrent processes to organize sequences
of operations on shared \emph{data items} into atomic transactions. A
transaction may commit, in which case it appears to have executed sequentially
or it may \emph{abort}, in which case no data item is updated.
The TM programming paradigm emerged as an alternative to conventional
fine-grained locking techniques, offering ease of programming and
compositionality. Though typically themselves implemented using locks, TMs hide
the inherent issues of lock-based synchronization behind a nice transactional
programming interface.
In this paper, we explore inherent time and space complexity of lock-based
TMs, with a focus of the most popular class of \emph{progressive} lock-based
TMs. We derive that a progressive TM might enforce a read-only transaction to
perform a quadratic (in the number of the data items it reads) number of steps
and access a linear number of distinct memory locations, closing the question
of inherent cost of \emph{read validation} in TMs. We then show that the total
number of \emph{remote memory references} (RMRs) that take place in an
execution of a progressive TM in which concurrent processes perform
transactions on a single data item might reach , which
appears to be the first RMR complexity lower bound for transactional memory.Comment: Model of Transactional Memory identical with arXiv:1407.6876,
arXiv:1502.0272
Recommended from our members
Software lock elision for x86 machine code
More than a decade after becoming a topic of intense research there is no
transactional memory hardware nor any examples of software transactional memory
use outside the research community. Using software transactional memory in large
pieces of software needs copious source code annotations and often means
that standard compilers and debuggers can no longer be used. At the same time,
overheads associated with software transactional memory fail to motivate
programmers to expend the needed effort to use software transactional
memory. The only way around the overheads in the case of general unmanaged code
is the anticipated availability of hardware support. On the other hand, architects
are unwilling to devote power and area budgets in mainstream microprocessors to
hardware transactional memory, pointing to transactional memory being a
"niche" programming construct. A deadlock has thus ensued that is blocking
transactional memory use and experimentation in the mainstream.
This dissertation covers the design and construction of a software transactional
memory runtime system called SLE_x86 that can potentially break this
deadlock by decoupling transactional memory from programs using it. Unlike most
other STM designs, the core design principle is transparency rather than
performance. SLE_x86 operates at the level of x86 machine code, thereby
becoming immediately applicable to binaries for the popular x86
architecture. The only requirement is that the binary synchronise using known
locking constructs or calls such as those in Pthreads or OpenMP
libraries. SLE_x86 provides speculative lock elision (SLE) entirely in
software, executing critical sections in the binary using transactional
memory. Optionally, the critical sections can also be executed without using
transactions by acquiring the protecting lock.
The dissertation makes a careful analysis of the impact on performance due to
the demands of the x86 memory consistency model and the need to transparently
instrument x86 machine code. It shows that both of these problems can be
overcome to reach a reasonable level of performance, where transparent
software transactional memory can perform better than a lock. SLE_x86 can
ensure that programs are ready for transactional memory in any form, without
being explicitly written for it
On the Semantics of Snapshot Isolation
Snapshot isolation (SI) is a standard transactional consistency model used in
databases, distributed systems and software transactional memory (STM). Its
semantics is formally defined both declaratively as an acyclicity axiom, and
operationally as a concurrent algorithm with memory bearing timestamps.
We develop two simpler equivalent operational definitions of SI as lock-based
reference implementations that do not use timestamps. Our first locking
implementation is prescient in that requires a priori knowledge of the data
accessed by a transaction and carries out transactional writes eagerly
(in-place). Our second implementation is non-prescient and performs
transactional writes lazily by recording them in a local log and propagating
them to memory at commit time. Whilst our first implementation is simpler and
may be better suited for developing a program logic for SI transactions, our
second implementation is more practical due to its non-prescience. We show that
both implementations are sound and complete against the declarative SI
specification and thus yield equivalent operational definitions for SI.
We further consider, for the first time formally, the use of SI in a context
with racy non-transactional accesses, as can arise in STM implementations of
SI. We introduce robust snapshot isolation (RSI), an adaptation of SI with
similar semantics and guarantees in this mixed setting. We present a
declarative specification of RSI as an acyclicity axiom and analogously develop
two operational models as lock-based reference implementations (one eager, one
lazy). We show that these operational models are both sound and complete
against the declarative RSI model
Inherent Limitations of Hybrid Transactional Memory
Several Hybrid Transactional Memory (HyTM) schemes have recently been
proposed to complement the fast, but best-effort, nature of Hardware
Transactional Memory (HTM) with a slow, reliable software backup. However, the
fundamental limitations of building a HyTM with nontrivial concurrency between
hardware and software transactions are still not well understood.
In this paper, we propose a general model for HyTM implementations, which
captures the ability of hardware transactions to buffer memory accesses, and
allows us to formally quantify and analyze the amount of overhead
(instrumentation) of a HyTM scheme. We prove the following: (1) it is
impossible to build a strictly serializable HyTM implementation that has both
uninstrumented reads and writes, even for weak progress guarantees, and (2)
under reasonable assumptions, in any opaque progressive HyTM, a hardware
transaction must incur instrumentation costs linear in the size of its data
set. We further provide two upper bound implementations whose instrumentation
costs are optimal with respect to their progress guarantees. In sum, this paper
captures for the first time an inherent trade-off between the degree of
concurrency a HyTM provides between hardware and software transactions, and the
amount of instrumentation overhead the implementation must incur
HeTM: Transactional Memory for Heterogeneous Systems
Modern heterogeneous computing architectures, which couple multi-core CPUs
with discrete many-core GPUs (or other specialized hardware accelerators),
enable unprecedented peak performance and energy efficiency levels.
Unfortunately, though, developing applications that can take full advantage of
the potential of heterogeneous systems is a notoriously hard task. This work
takes a step towards reducing the complexity of programming heterogeneous
systems by introducing the abstraction of Heterogeneous Transactional Memory
(HeTM). HeTM provides programmers with the illusion of a single memory region,
shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with
support for atomic transactions. Besides introducing the abstract semantics and
programming model of HeTM, we present the design and evaluation of a concrete
implementation of the proposed abstraction, which we named Speculative HeTM
(SHeTM). SHeTM makes use of a novel design that leverages on speculative
techniques and aims at hiding the inherently large communication latency
between CPUs and discrete GPUs and at minimizing inter-device synchronization
overhead. SHeTM is based on a modular and extensible design that allows for
easily integrating alternative TM implementations on the CPU's and GPU's sides,
which allows the flexibility to adopt, on either side, the TM implementation
(e.g., in hardware or software) that best fits the applications' workload and
the architectural characteristics of the processing unit. We demonstrate the
efficiency of the SHeTM via an extensive quantitative study based both on
synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on
Parallel Architectures and Compilation Techniques (PACT'19
- …