303 research outputs found
Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures
The hardware transactional memory (HTM) implementations in commercially
available processors are significantly hindered by their tight capacity
constraints. In practice, this renders current HTMs unsuitable to many
real-world workloads of in-memory databases. This paper proposes SI-HTM, which
stretches the capacity bounds of the underlying HTM, thus opening HTM to a much
broader class of applications. SI-HTM leverages the HTM implementation of the
IBM POWER architecture with a software layer to offer a single-version
implementation of Snapshot Isolation. When compared to HTM- and software-based
concurrency control alternatives, SI-HTM exhibits improved scalability,
achieving speedups of up to 300% relatively to HTM on in-memory database
benchmarks
HaTS: Hardware-Assisted Transaction Scheduler
In this paper we present HaTS, a Hardware-assisted Transaction Scheduler. HaTS improves performance of concurrent applications by classifying the executions of their atomic blocks (or in-memory transactions) into scheduling queues, according to their so called conflict indicators. The goal is to group those transactions that are conflicting while letting non-conflicting transactions proceed in parallel. Two core innovations characterize HaTS. First, HaTS does not assume the availability of precise information associated with incoming transactions in order to proceed with the classification. It relaxes this assumption by exploiting the inherent conflict resolution provided by Hardware Transactional Memory (HTM). Second, HaTS dynamically adjusts the number of the scheduling queues in order to capture the actual application contention level. Performance results using the STAMP benchmark suite show up to 2x improvement over state-of-the-art HTM-based scheduling techniques
Accelerating Transactional Memory by Exploiting Platform Specificity
Transactional Memory (TM) is one of the most promising alternatives to lock-based concurrency, but there still remain obstacles that keep TM from being utilized in the real world. Performance, in terms of high scalability and low latency, is always one of the most important keys to general purpose usage. While most of the research in this area focuses on improving a specific single TM implementation and some default platform (a certain operating system, compiler and/or processor), little has been conducted on improving performance more generally, and across platforms.We found that by utilizing platform specificity, we could gain tremendous performance improvement and avoid unnecessary costs due to false assumptions of platform properties, on not only a single TM implementation, but many. In this dissertation, we will present our findings in four sections: 1) we discover and quantify hidden costs from inappropriate compiler instrumentations, and provide sug- gestions and solutions; 2) we boost a set of mainstream timestamp-based TM implementations with the x86-specific hardware cycle counter; 3) we explore compiler opportunities to reduce the transaction abort rate, by reordering read-modify-write operations — the whole technique can be applied to all TM implementations, and could be more effective with some help from compilers; and 4) we coordinate the state-of-the-art Intel Haswell TSX hardware TM with a software TM “Cohorts”, and develop a safe and flexible Hybrid TM, “HyCo”, to be our final performance boost in this dissertation.The impact of our research extends beyond Transactional Memory, to broad areas of concurrent programming. Some of our solutions and discussions, such as the synchronization between accesses of the hardware cycle counter and memory loads and stores, can be utilized to boost concurrent data structures and many timestamp-based systems and applications. Others, such as discussions of compiler instrumentation costs and reordering opportunities, provide additional insights to compiler designers. Our findings show that platform specificity must be taken into consideration to achieve peak performance
An Analytical Model of Hardware Transactional Memory
This paper investigates the problem of deriving a white box performance model of Hardware Transactional Memory (HTM) systems. The proposed model targets TSX, a popular implementation of HTM integrated in Intel processors starting with the Haswell family in 2013. An inherent difficulty with building white-box models of commercially available HTM systems is that their internals are either vaguely documented or undisclosed by their manufacturers. We tackle this challenge by designing a set of experiments that allow us to shed lights on the internal mechanisms used in TSX to manage conflicts among transactions and to track their readsets and writesets. We exploit the information inferred from this experimental study to build an analytical model of TSX focused on capturing the impact on performance of two key mechanisms: the concurrency control scheme and the management of transactional meta-data in the processor's caches. We validate the proposed model by means of an extensive experimental study encompassing a broad range of workloads executed on a real system
Lazy State Determination for SQL databases
Transactional systems have seen various efforts to increase their throughput, mainly
by making use of parallelism and efficient Concurrency Control techniques. Most approaches
optimize the systems’ behaviour when under high contention.
In this work, we strive towards reducing the system’s overall contention through Lazy
State Determination (LSD). LSD is a new transactional API that leverages on futures
to delay the accesses to the Database as much as possible, reducing the amount of time
that transactions require to operate under isolation and, thus, reducing the contention
window.
LSD was shown to be a promising solution for Key-Value Stores. Now, our focus turns
to Relational Database Management Systems, as we attempt to implement and evaluate
LSD in this new setting. This implementation was done through a custom JDBC driver
to minimize required modifications to any external platform.
Results show that the reduction of the contention window effectively improves the
success rate of transactional applications. However, our current implementation exhibits
some performance issues that must be further investigated and addressed.Os sistemas transacionais têm sido alvo de esforços variados para aumentar a sua velocidade
de processamento, principalmente através de paralelismo e de técnicas de controlo
de concorrência mais eficazes. A maior parte das soluções propostas visam a otimização
do comportamento destes sistemas em ambientes de elevada contenção.
Neste trabalho, nós iremos reduzir a contenção no sistema recorrendo ao Lazy State
Determination (LSD). O LSD é uma nova API transacional que promove a utilização
de futuros para adiar o máximo os acessos à Base de Dados, reduzindo assim o tempo
que cada transação requer para executar em isolamento e, por consequência, reduzindo
também a janela de contenção.
O LSD tem-se mostrado uma solução promissora para bases de dados Chave-Valor.
O nosso foco foi agora redirecionado para Sistemas de Gestão de Bases de Dados Relacionais,
com uma tentativa de implementação e avaliação do LSD neste novo contexto.
Este objetivo foi concretizado através da implementação de um controlador JDBC para
minimizar quaisquer alterações a plataformas externas.
Os resultados mostram que a redução da janela de contenção efetivamente melhora
a taxa de sucesso de aplicações transacionais. No entanto, a nossa implementação atual
tem alguns problemas de desempenho que necessitam de ser investigados e endereçados
Recommended from our members
Galois : a system for parallel execution of irregular algorithms
textA programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.Computer Science
Software Transactional Memory Building Blocks
Exploiting thread-level parallelism has become a part of mainstream programming in recent years. Many approaches to parallelization require threads executing in parallel to also synchronize occassionally (i.e., coordinate concurrent accesses to shared state). Transactional Memory (TM) is a programming abstraction that provides the concept of database transactions in the context of programming languages such as C/C++. This allows programmers to only declare which pieces of a program synchronize without requiring them to actually implement synchronization and tune its performance, which in turn makes TM typically easier to use than other abstractions such as locks.
I have investigated and implemented the building blocks that are required for a high-performance, practical, and realistic TM. They host several novel algorithms and optimizations for TM implementations, both for current hardware and future hardware extensions for TM, and are being used in or have influenced commercial TM implementations such as the TM support in GCC
Exploiting distributed software transactional memory
Over the past years research and development on computer architecture has shifted from uni-processor systems to multi-core architectures. This transition has created new incentives in software development because in order for the software to scale it has to be highly parallel. Traditional synchronization primitives based on mutual exclusion locking are challenging to use and therefore are only efficiently employed by a minority of expert programmers. Transactional Memory (TM) is a new alternative parallel programming model aiming to alleviate the problems that arise from the use of explicit synchronization mechanisms. In TM, lock guarded code is replaced by memory transactions which comply with the ACI (atomicity, consistency, isolation) principles. The simplicity of the programming model that TM proposes has led to major research efforts by academia and industry to produce high-performance TM implementations. The majority of these TM systems, however, focus on shared-memory Chip MultiProcessors (CMPs) leaving the area of distributed systems unexplored. This thesis explores Transactional Memory in the distributed systems domain and more specifically on small-scale clusters. A variety of novel distributed transactional coherence protocols are proposed and evaluated, against complex TM oriented benchmarks, in the context of distributed Java Virtual Machines (JVMs) - an area that has received much attention over the last decade due to its perfect applicability into the enterprise domain. The implemented Distributed Software Transactional Memory (DiSTM) system, proposed in this thesis, is a JVM clustering solution that employs software transactional memory as its synchronization mechanism. Due to its modular design and ease in programming, it allows the addition of new protocols in a fairly easy manner. Finally, DiSTM is highly portable as it runs on top of off-the-shelf JVMs and requires no changes to existing Java source code.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
- …