303 research outputs found

    Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures

    Full text link
    The hardware transactional memory (HTM) implementations in commercially available processors are significantly hindered by their tight capacity constraints. In practice, this renders current HTMs unsuitable to many real-world workloads of in-memory databases. This paper proposes SI-HTM, which stretches the capacity bounds of the underlying HTM, thus opening HTM to a much broader class of applications. SI-HTM leverages the HTM implementation of the IBM POWER architecture with a software layer to offer a single-version implementation of Snapshot Isolation. When compared to HTM- and software-based concurrency control alternatives, SI-HTM exhibits improved scalability, achieving speedups of up to 300% relatively to HTM on in-memory database benchmarks

    HaTS: Hardware-Assisted Transaction Scheduler

    Get PDF
    In this paper we present HaTS, a Hardware-assisted Transaction Scheduler. HaTS improves performance of concurrent applications by classifying the executions of their atomic blocks (or in-memory transactions) into scheduling queues, according to their so called conflict indicators. The goal is to group those transactions that are conflicting while letting non-conflicting transactions proceed in parallel. Two core innovations characterize HaTS. First, HaTS does not assume the availability of precise information associated with incoming transactions in order to proceed with the classification. It relaxes this assumption by exploiting the inherent conflict resolution provided by Hardware Transactional Memory (HTM). Second, HaTS dynamically adjusts the number of the scheduling queues in order to capture the actual application contention level. Performance results using the STAMP benchmark suite show up to 2x improvement over state-of-the-art HTM-based scheduling techniques

    Accelerating Transactional Memory by Exploiting Platform Specificity

    Get PDF
    Transactional Memory (TM) is one of the most promising alternatives to lock-based concurrency, but there still remain obstacles that keep TM from being utilized in the real world. Performance, in terms of high scalability and low latency, is always one of the most important keys to general purpose usage. While most of the research in this area focuses on improving a specific single TM implementation and some default platform (a certain operating system, compiler and/or processor), little has been conducted on improving performance more generally, and across platforms.We found that by utilizing platform specificity, we could gain tremendous performance improvement and avoid unnecessary costs due to false assumptions of platform properties, on not only a single TM implementation, but many. In this dissertation, we will present our findings in four sections: 1) we discover and quantify hidden costs from inappropriate compiler instrumentations, and provide sug- gestions and solutions; 2) we boost a set of mainstream timestamp-based TM implementations with the x86-specific hardware cycle counter; 3) we explore compiler opportunities to reduce the transaction abort rate, by reordering read-modify-write operations — the whole technique can be applied to all TM implementations, and could be more effective with some help from compilers; and 4) we coordinate the state-of-the-art Intel Haswell TSX hardware TM with a software TM “Cohorts”, and develop a safe and flexible Hybrid TM, “HyCo”, to be our final performance boost in this dissertation.The impact of our research extends beyond Transactional Memory, to broad areas of concurrent programming. Some of our solutions and discussions, such as the synchronization between accesses of the hardware cycle counter and memory loads and stores, can be utilized to boost concurrent data structures and many timestamp-based systems and applications. Others, such as discussions of compiler instrumentation costs and reordering opportunities, provide additional insights to compiler designers. Our findings show that platform specificity must be taken into consideration to achieve peak performance

    An Analytical Model of Hardware Transactional Memory

    Get PDF
    This paper investigates the problem of deriving a white box performance model of Hardware Transactional Memory (HTM) systems. The proposed model targets TSX, a popular implementation of HTM integrated in Intel processors starting with the Haswell family in 2013. An inherent difficulty with building white-box models of commercially available HTM systems is that their internals are either vaguely documented or undisclosed by their manufacturers. We tackle this challenge by designing a set of experiments that allow us to shed lights on the internal mechanisms used in TSX to manage conflicts among transactions and to track their readsets and writesets. We exploit the information inferred from this experimental study to build an analytical model of TSX focused on capturing the impact on performance of two key mechanisms: the concurrency control scheme and the management of transactional meta-data in the processor's caches. We validate the proposed model by means of an extensive experimental study encompassing a broad range of workloads executed on a real system

    Lazy State Determination for SQL databases

    Get PDF
    Transactional systems have seen various efforts to increase their throughput, mainly by making use of parallelism and efficient Concurrency Control techniques. Most approaches optimize the systems’ behaviour when under high contention. In this work, we strive towards reducing the system’s overall contention through Lazy State Determination (LSD). LSD is a new transactional API that leverages on futures to delay the accesses to the Database as much as possible, reducing the amount of time that transactions require to operate under isolation and, thus, reducing the contention window. LSD was shown to be a promising solution for Key-Value Stores. Now, our focus turns to Relational Database Management Systems, as we attempt to implement and evaluate LSD in this new setting. This implementation was done through a custom JDBC driver to minimize required modifications to any external platform. Results show that the reduction of the contention window effectively improves the success rate of transactional applications. However, our current implementation exhibits some performance issues that must be further investigated and addressed.Os sistemas transacionais têm sido alvo de esforços variados para aumentar a sua velocidade de processamento, principalmente através de paralelismo e de técnicas de controlo de concorrência mais eficazes. A maior parte das soluções propostas visam a otimização do comportamento destes sistemas em ambientes de elevada contenção. Neste trabalho, nós iremos reduzir a contenção no sistema recorrendo ao Lazy State Determination (LSD). O LSD é uma nova API transacional que promove a utilização de futuros para adiar o máximo os acessos à Base de Dados, reduzindo assim o tempo que cada transação requer para executar em isolamento e, por consequência, reduzindo também a janela de contenção. O LSD tem-se mostrado uma solução promissora para bases de dados Chave-Valor. O nosso foco foi agora redirecionado para Sistemas de Gestão de Bases de Dados Relacionais, com uma tentativa de implementação e avaliação do LSD neste novo contexto. Este objetivo foi concretizado através da implementação de um controlador JDBC para minimizar quaisquer alterações a plataformas externas. Os resultados mostram que a redução da janela de contenção efetivamente melhora a taxa de sucesso de aplicações transacionais. No entanto, a nossa implementação atual tem alguns problemas de desempenho que necessitam de ser investigados e endereçados

    Software Transactional Memory Building Blocks

    Get PDF
    Exploiting thread-level parallelism has become a part of mainstream programming in recent years. Many approaches to parallelization require threads executing in parallel to also synchronize occassionally (i.e., coordinate concurrent accesses to shared state). Transactional Memory (TM) is a programming abstraction that provides the concept of database transactions in the context of programming languages such as C/C++. This allows programmers to only declare which pieces of a program synchronize without requiring them to actually implement synchronization and tune its performance, which in turn makes TM typically easier to use than other abstractions such as locks. I have investigated and implemented the building blocks that are required for a high-performance, practical, and realistic TM. They host several novel algorithms and optimizations for TM implementations, both for current hardware and future hardware extensions for TM, and are being used in or have influenced commercial TM implementations such as the TM support in GCC

    Exploiting distributed software transactional memory

    Get PDF
    Over the past years research and development on computer architecture has shifted from uni-processor systems to multi-core architectures. This transition has created new incentives in software development because in order for the software to scale it has to be highly parallel. Traditional synchronization primitives based on mutual exclusion locking are challenging to use and therefore are only efficiently employed by a minority of expert programmers. Transactional Memory (TM) is a new alternative parallel programming model aiming to alleviate the problems that arise from the use of explicit synchronization mechanisms. In TM, lock guarded code is replaced by memory transactions which comply with the ACI (atomicity, consistency, isolation) principles. The simplicity of the programming model that TM proposes has led to major research efforts by academia and industry to produce high-performance TM implementations. The majority of these TM systems, however, focus on shared-memory Chip MultiProcessors (CMPs) leaving the area of distributed systems unexplored. This thesis explores Transactional Memory in the distributed systems domain and more specifically on small-scale clusters. A variety of novel distributed transactional coherence protocols are proposed and evaluated, against complex TM oriented benchmarks, in the context of distributed Java Virtual Machines (JVMs) - an area that has received much attention over the last decade due to its perfect applicability into the enterprise domain. The implemented Distributed Software Transactional Memory (DiSTM) system, proposed in this thesis, is a JVM clustering solution that employs software transactional memory as its synchronization mechanism. Due to its modular design and ease in programming, it allows the addition of new protocols in a fairly easy manner. Finally, DiSTM is highly portable as it runs on top of off-the-shelf JVMs and requires no changes to existing Java source code.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore