188 research outputs found
Automatic skeleton-driven performance optimizations for transactional memory
The recent shift toward multi -core chips has pushed the burden of extracting performance to the programmer. In fact, programmers now have to be able to uncover more
coarse -grain parallelism with every new generation of processors, or the performance
of their applications will remain roughly the same or even degrade. Unfortunately,
parallel programming is still hard and error prone. This has driven the development of
many new parallel programming models that aim to make this process efficient.This thesis first combines the skeleton -based and transactional memory programming models in a new framework, called OpenSkel, in order to improve performance
and programmability of parallel applications. This framework provides a single skeleton that allows the implementation of transactional worklist applications. Skeleton or
pattern-based programming allows parallel programs to be expressed as specialized instances of generic communication and computation patterns. This leaves the programmer with only the implementation of the particular operations required to solve the
problem at hand. Thus, this programming approach simplifies parallel programming
by eliminating some of the major challenges of parallel programming, namely thread
communication, scheduling and orchestration. However, the application programmer
has still to correctly synchronize threads on data races. This commonly requires the
use of locks to guarantee atomic access to shared data. In particular, lock programming
is vulnerable to deadlocks and also limits coarse grain parallelism by blocking threads
that could be potentially executed in parallel.Transactional Memory (TM) thus emerges as an attractive alternative model to simplify parallel programming by removing this burden of handling data races explicitly.
This model allows programmers to write parallel code as transactions, which are then
guaranteed by the runtime system to execute atomically and in isolation regardless of
eventual data races. TM programming thus frees the application from deadlocks and
enables the exploitation of coarse grain parallelism when transactions do not conflict
very often. Nevertheless, thread management and orchestration are left for the application programmer. Fortunately, this can be naturally handled by a skeleton framework.
This fact makes the combination of skeleton -based and transactional programming a
natural step to improve programmability since these models complement each other.
In fact, this combination releases the application programmer from dealing with thread
management and data races, and also inherits the performance improvements of both
models. In addition to it, a skeleton framework is also amenable to skeleton - driven
iii
performance optimizations that exploits the application pattern and system information.This thesis thus also presents a set of pattern- oriented optimizations that are automatically selected and applied in a significant subset of transactional memory applications that shares a common pattern called worklist. These optimizations exploit the
knowledge about the worklist pattern and the TM nature of the applications to avoid
transaction conflicts, to prefetch data, to reduce contention etc. Using a novel autotuning mechanism, OpenSkel dynamically selects the most suitable set of these patternoriented performance optimizations for each application and adjusts them accordingly.
Experimental results on a subset of five applications from the STAMP benchmark suite
show that the proposed autotuning mechanism can achieve performance improvements
within 2 %, on average, of a static oracle for a 16 -core UMA (Uniform Memory Access) platform and surpasses it by 7% on average for a 32 -core NUMA (Non -Uniform
Memory Access) platform.Finally, this thesis also investigates skeleton -driven system- oriented performance
optimizations such as thread mapping and memory page allocation. In order to do
it, the OpenSkel system and also the autotuning mechanism are extended to accommodate these optimizations. The conducted experimental results on a subset of five
applications from the STAMP benchmark show that the OpenSkel framework with the
extended autotuning mechanism driving both pattern and system- oriented optimizations can achieve performance improvements of up to 88 %, with an average of 46 %,
over a baseline version for a 16 -core UMA platform and up to 162 %, with an average
of 91 %, for a 32 -core NUMA platform
Static Application-Level Race Detection in STM Haskell using Contracts
Writing concurrent programs is a hard task, even when using high-level
synchronization primitives such as transactional memories together with a
functional language with well-controlled side-effects such as Haskell, because
the interferences generated by the processes to each other can occur at
different levels and in a very subtle way. The problem occurs when a thread
leaves or exposes the shared data in an inconsistent state with respect to the
application logic or the real meaning of the data. In this paper, we propose to
associate contracts to transactions and we define a program transformation that
makes it possible to extend static contract checking in the context of STM
Haskell. As a result, we are able to check statically that each transaction of
a STM Haskell program handles the shared data in a such way that a given
consistency property, expressed in the form of a user-defined boolean function,
is preserved. This ensures that bad interference will not occur during the
execution of the concurrent program.Comment: In Proceedings PLACES 2013, arXiv:1312.2218. [email protected];
[email protected]
Design and Implementation of Real-Time Transactional Memory
Abstract—Transactional memory is a promising, optimistic synchronization mechanism for chip-multiprocessor systems. The simplicity of atomic sections, instead of using explicit locks, is also appealing for real-time systems. In this paper an implementation of real-time transactional memory (RTTM) in the context of a real-time Java chip-multiprocessor (CMP) is presented. To provide a predictable and analyzable solution of transactional memory, the transaction buffer is organized fully associative. Evaluation in an FPGA shows that an associativity of up to 64-way is possible without degrading the overall system performance. The paper presents synthesis results for different RTTM configurations and different number of processor cores in the CMP system. A CMP system with up to 8 processor cores with RTTM support is feasible in an Altera Cyclone-II FPGA
Consistent state software transactional memory
Dissertação apresentada na Faculdade de
Ciências e Tecnologia da Universidade
Nova de Lisboa para a obtenção do Grau
de Mestre em Engenharia InformáticaAs the multicore CPUs start getting into everyone’s computers, concurrent programming
must start covering, not only the scientific and enterprise applications, but also
every computer application we all use in our daily lives.
Since the introduction of software transactional memory [ST95,HLMWNS03], this
topic has had a strong interest by the scientific community as it has the potential of
greatly facilitating concurrent programming by hiding the concurrency issues under
the transactional layer.
This thesis builds on the TL2 STM engine [DON06], which is one of the top performing
to date. We have explored different design alternatives focusing on performance
and safety. With our research we have achieved performance improvements and better
safety properties of the engine. We have also achieved a much better understanding of
the design alternatives and their impacts.
During the course of this thesis we have come across several tough concurrency
bugs and we have created a list of testing patterns, which proved to be useful in finding
and reproducing several problems.
This thesis describes the cutting edge of STM engine technology, elaborates on the
design of a new STM engine and reports on the experimental results obtained
- …