143 research outputs found
Maintaining the correctness of transactional memory programs
Dissertação para obtenção do Grau de Doutor em
Engenharia InformáticaThis dissertation addresses the challenge of maintaining the correctness of transactional memory programs, while improving its parallelism with small transactions and relaxed isolation levels.
The efficiency of the transactional memory systems depends directly on the level of parallelism, which in turn depends on the conflict rate. A high conflict rate between memory transactions can be addressed by reducing the scope of transactions, but this approach may turn the application prone to the occurrence of atomicity violations. Another way to address this issue is to ignore some of the conflicts by using a relaxed isolation level, such as snapshot isolation, at the cost of introducing write-skews serialization anomalies that break the consistency guarantees provided by a stronger consistency property, such as opacity.
In order to tackle the correctness issues raised by the atomicity violations and the write-skew anomalies, we propose two static analysis techniques: one based in a novel static analysis algorithm that works on a dependency graph of program variables and detects atomicity violations;
and a second one based in a shape analysis technique supported by separation logic augmented with heap path expressions, a novel representation based on sequences of heap dereferences that certifies if a transactional memory program executing under snapshot isolation is free from writeskew
anomalies.
The evaluation of the runtime execution of a transactional memory algorithm using snapshot
isolation requires a framework that allows an efficient implementation of a multi-version algorithm and, at the same time, enables its comparison with other existing transactional memory algorithms. In the Java programming language there was no framework satisfying both these requirements. Hence, we extended an existing software transactional memory framework that already supported efficient implementations of some transactional memory algorithms, to also
support the efficient implementation of multi-version algorithms. The key insight for this extension is the support for storing the transactional metadata adjacent to memory locations. We illustrate the benefits of our approach by analyzing its impact with both single- and multi-version transactional memory algorithms using several transactional workloads.Fundação para a Ciência e Tecnologia - PhD research grant SFRH/BD/41765/2007, and in
the research projects Synergy-VM (PTDC/EIA-EIA/113613/2009), and RepComp (PTDC/EIAEIA/
108963/2008
Recommended from our members
Software lock elision for x86 machine code
More than a decade after becoming a topic of intense research there is no
transactional memory hardware nor any examples of software transactional memory
use outside the research community. Using software transactional memory in large
pieces of software needs copious source code annotations and often means
that standard compilers and debuggers can no longer be used. At the same time,
overheads associated with software transactional memory fail to motivate
programmers to expend the needed effort to use software transactional
memory. The only way around the overheads in the case of general unmanaged code
is the anticipated availability of hardware support. On the other hand, architects
are unwilling to devote power and area budgets in mainstream microprocessors to
hardware transactional memory, pointing to transactional memory being a
"niche" programming construct. A deadlock has thus ensued that is blocking
transactional memory use and experimentation in the mainstream.
This dissertation covers the design and construction of a software transactional
memory runtime system called SLE_x86 that can potentially break this
deadlock by decoupling transactional memory from programs using it. Unlike most
other STM designs, the core design principle is transparency rather than
performance. SLE_x86 operates at the level of x86 machine code, thereby
becoming immediately applicable to binaries for the popular x86
architecture. The only requirement is that the binary synchronise using known
locking constructs or calls such as those in Pthreads or OpenMP
libraries. SLE_x86 provides speculative lock elision (SLE) entirely in
software, executing critical sections in the binary using transactional
memory. Optionally, the critical sections can also be executed without using
transactions by acquiring the protecting lock.
The dissertation makes a careful analysis of the impact on performance due to
the demands of the x86 memory consistency model and the need to transparently
instrument x86 machine code. It shows that both of these problems can be
overcome to reach a reasonable level of performance, where transparent
software transactional memory can perform better than a lock. SLE_x86 can
ensure that programs are ready for transactional memory in any form, without
being explicitly written for it
Composing Relaxed Transactions
As the classical transactional abstraction is sometimes considered too restrictive in leveraging parallelism, a lot of work has been devoted to devising relaxed transactional models with the goal of improving concurrency. Nevertheless, the quest for improving concurrency has somehow led to neglect one of the most appealing aspects of transactions: software composition, namely, the ability to develop pieces of software independently and compose them into applications that behave correctly in the face of concurrency. Indeed, a closer look at relaxed transactional models reveals that they do jeopardize composition, raising the fundamental question whether it is at all possible to devise such models while preserving composition. This paper shows that the answer is positive. We present outheritance, a necessary and sufficient condition for a (potentially relaxed) transactional memory to support composition. Basically, outheritance requires child transactions to pass their conflict information to their parent transaction, which in turn maintains this information until commit time. Concrete instantiations of this idea have been used before, classical transactions being the most prevalent example, but we believe to be the first to capture this as a general principle as well as to prove that it is, strictly speaking, equivalent to ensuring composition. We illustrate the benefits of outheritance using elastic transactions and show how they can satisfy outheritance and provide composition without hampering concurrency. We leverage this to present a new (transactional) Java package, a composable alternative to the concurrency package of the JDK, and evaluate efficiency through an implementation that speeds up state of the art software transactional memory implementations (TL2, LSA, SwissTM) by almost a factor of 3
Avoiding Publication and Privatization Problems on Software Transactional Memory
This paper presents a new approach to exclude problems arising from dynamically switching between protected concurrent and unprotected single-threaded use of shared data when using software transactional memory in OO languages such as Java. The approach is based on a simple but effective programming model separating transactions from non-transactional operation. It prevents the application programmer from errors but does not force the software transactional memory library to observe non-transactional access and thereby preserves modularity of the software. A prototypical toolchain for validation and source code instrumentation was implemented as a proof of concept
Static detection of anomalies in transactional memory programs
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do Grau de Mestre em Engenharia InformáticaTransactional Memory (TM) is an approach to concurrent programming based on the transactional semantics borrowed from database systems. In this paradigm, a transaction is a sequence of actions that may execute in a single logical instant, as though it was the only one being executed
at that moment. Unlike concurrent systems based in locks, TM does not enforce that a
single thread is performing the guarded operations. Instead, like in database systems, transactions execute concurrently, and the effects of a transaction are undone in case of a conflict, as though it never happened. The advantages of TM are an easier and less error-prone programming model, and a potential increase in scalability and performance.
In spite of these advantages, TM is still a young and immature technology, and has still
to become an established programming model. It still lacks the paraphernalia of tools and
standards which we have come to expect from a widely used programming paradigm. Testing
and analysis techniques and algorithms for TM programs are also just starting to be addressed by the scientific community, making this a leading research work is many of these aspects.
This work is aimed at statically identifying possible runtime anomalies in TMprograms. We
addressed both low-level dataraces in TM programs, as well as high-level anomalies resulting from incorrect splitting of transactions.
We have defined and implemented an approach to detect low-level dataraces in TM programs
by converting all the memory transactions into monitor protected critical regions, synchronized on a newly generated global lock. To validate the approach, we have applied our tool to a set of tests, adapted from the literature, that contain well documented errors.
We have also defined and implemented a new approach to static detection of high-level
concurrency anomalies in TM programs. This new approach works by conservatively tracing
transactions, and matching the interference between each consecutive pair of transactions
against a set of defined anomaly patterns. Once again, the approach was validated with well documented tests adapted from the literature
A modular distributed transactional memory framework
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaThe traditional lock-based concurrency control is complex and error-prone due to its
low-level nature and composability challenges. Software transactional memory (STM), inherited from the database world, has risen as an exciting alternative, sparing the programmer from dealing explicitly with such low-level mechanisms.
In real world scenarios, software is often faced with requirements such as high availability and scalability, and the solution usually consists on building a distributed system.
Given the benefits of STM over traditional concurrency controls, Distributed Software
Transactional Memory (DSTM) is now being investigated as an attractive alternative for
distributed concurrency control.
Our long-term objective is to transparently enable multithreaded applications to execute
over a DSTM setting. In this work we intend to pave the way by defining a modular
DSTM framework for the Java programming language. We extend an existing, efficient,
STM framework with a new software layer to create a DSTM framework. This new layer
interacts with the local STM using well-defined interfaces, and allows the implementation of different distributed memory models while providing a non-intrusive, familiar,programming model to applications, unlike any other DSTM framework.
Using the proposed DSTM framework we have successfully, and easily, implemented
a replicated STM which uses a Certification protocol to commit transactions. An evaluation using common STM benchmarks showcases the efficiency of the replicated STM,and its modularity enables us to provide insight on the relevance of different implementations of the Group Communication System required by the Certification scheme, with respect to performance under different workloads.Fundação para a Ciência e Tecnologia - project (PTDC/EIA-EIA/113613/2009
Enhancing the efficiency and practicality of software transactional memory on massively multithreaded systems
Chip Multithreading (CMT) processors promise to deliver higher performance by running more than one stream of instructions in parallel. To exploit CMT's capabilities, programmers have to parallelize their applications, which is not a trivial task. Transactional Memory (TM) is one of parallel programming models that aims at simplifying synchronization by raising the level of abstraction between semantic atomicity and the means by which that atomicity is achieved. TM is a promising programming model but there are still important challenges that must be addressed to make it more practical and efficient in mainstream parallel programming.
The first challenge addressed in this dissertation is that of making the evaluation of TM proposals more solid with realistic TM benchmarks and being able to run the same benchmarks on different STM systems. We first introduce a benchmark suite, RMS-TM, a comprehensive benchmark suite to evaluate HTMs and STMs. RMS-TM consists of seven applications from the Recognition, Mining and Synthesis (RMS) domain that are representative of future workloads. RMS-TM features current TM research issues such as nesting and I/O inside transactions, while also providing various TM characteristics. Most STM systems are implemented as user-level libraries: the programmer is expected to manually instrument not only transaction boundaries, but also individual loads and stores within transactions. This library-based approach is increasingly tedious and error prone and also makes it difficult to make reliable performance comparisons. To enable an "apples-to-apples" performance comparison, we then develop a software layer that allows researchers to test the same applications with interchangeable STM back ends.
The second challenge addressed is that of enhancing performance and scalability of TM applications running on aggressive multi-core/multi-threaded processors. Performance and scalability of current TM designs, in particular STM desings, do not always meet the programmer's expectation, especially at scale. To overcome this limitation, we propose a new STM design, STM2, based on an assisted execution model in which time-consuming TM operations are offloaded to auxiliary threads while application threads optimistically perform computation. Surprisingly, our results show that STM2 provides, on average, speedups between 1.8x and 5.2x over state-of-the-art STM systems. On the other hand, we notice that assisted-execution systems may show low processor utilization. To alleviate this problem and to increase the efficiency of STM2, we enriched STM2 with a runtime mechanism that automatically and adaptively detects application and auxiliary threads' computing demands and dynamically partition hardware resources between the pair through the hardware thread prioritization mechanism implemented in POWER machines.
The third challenge is to define a notion of what it means for a TM program to be correctly synchronized. The current definition of transactional data race requires all transactions to be totally ordered "as if'' serialized by a global lock, which limits the scalability of TM designs. To remove this constraint, we first propose to relax the current definition of transactional data race to allow a higher level of concurrency. Based on this definition we propose the first practical race detection algorithm for C/C++ applications (TRADE) and implement the corresponding race detection tool. Then, we introduce a new definition of transactional data race that is more intuitive, transparent to the underlying TM implementation, can be used for a broad set of C/C++ TM programs. Based on this new definition, we proposed T-Rex, an efficient and scalable race detection tool for C/C++ TM applications. Using TRADE and T-Rex, we have discovered subtle transactional data races in widely-used STAMP applications which have not been reported in the past
Extensible Transactional Memory Testbed
Transactional Memory (TM) is a promising abstraction as it hides all synchronization complexities from the programmers of concurrent applications. More particularly the TM paradigm operated a complexity shift from the application programming to the TM programming. Therefore, expert programmers have now started to look for the ideal TM that will bring, once-for-all, performance to all concurrent applications. Researchers have recently identified numerous issues TMs may suffer from. Surprisingly, no TMs have ever been tested in these scenarios. In this paper, we present the first to date TM testbed. We propose a framework, TMunit, that provides a domain specific language to write rapidly TM workloads so that our test-suite is easily extensible. Our reproducible semantic tests indicate through reproducible counter-examples that existing TMs do not satisfy recent consistency criteria. Our performance tests identify workloads where well-known TMs perform differently. Finally, additional tests indicate some workloads preventing contention managers from progressing
- …