961 research outputs found
Towards lightweight and high-performance hardware transactional memory
Conventional lock-based synchronization serializes accesses to critical sections guarded by the same lock. Using multiple locks brings the possibility of a deadlock or a livelock in the program, making parallel programming a difficult task. Transactional Memory (TM) is a promising paradigm for parallel programming, offering an alternative to lock-based synchronization. TM eliminates the risk of deadlocks and livelocks, while it provides the desirable semantics of Atomicity, Consistency, and Isolation of critical sections. TM speculatively executes a series of memory accesses as a single, atomic, transaction. The speculative changes of a transaction are kept private until the transaction commits. If a transaction can break the atomicity or cause a deadlock or livelock, the TM system aborts the transaction and rolls back the speculative changes.
To be effective, a TM implementation should provide high performance and scalability. While implementations of TM in pure software (STM) do not provide desirable performance, Hardware TM (HTM) implementations introduce much smaller overhead and have relatively good scalability, due to their better control of hardware resources. However, many HTM systems support only the transactions that fit limited hardware resources (for example, private caches), and fall back to software mechanisms if hardware limits are reached. These HTM systems, called best-effort HTMs, are not desirable since they force a programmer to think in terms of hardware limits, to use both HTM and STM, and to manage concurrent transactions in HTM and STM. In contrast with best-effort HTMs, unbounded HTM systems support overflowed transactions, that do not fit into private caches. Unbounded HTM systems often require complex protocols or expensive hardware mechanisms for conflict detection between overflowed transactions. In addition, an execution with overflowed transactions is often much slower than an execution that has only regular transactions. This is typically due to restrictive or approximative conflict management mechanism used for overflowed transactions.
In this thesis, we study hardware implementations of transactional memory, and make three main contributions. First, we improve the general performance of HTM systems by proposing a scalable protocol for conflict management. The protocol has precise conflict detection, in contrast with often-employed inexact Bloom-filter-based conflict detection, which often falsely report conflicts between transactions. Second, we propose a best-effort HTM that utilizes the new scalable conflict detection protocol, termed EazyHTM. EazyHTM allows parallel commits for all non-conflicting transactions, and generally simplifies transaction commits.
Finally, we propose an unbounded HTM that extends and improves the initial protocol for conflict management, and we name it EcoTM. EcoTM features precise conflict detection, and it efficiently supports large as well as small and short transactions. The key idea of EcoTM is to leverage an observation that very few locations are actually conflicting, even if applications have high contention. In EcoTM, each core locally detects if a cache line is non-conflicting, and conflict detection mechanism is invoked only for the few potentially conflicting cache lines.La Sincronización tradicional basada en los cerrojos de exclusión mutua (locks) serializa los accesos a las secciones críticas protegidas este cerrojo. La utilización de varios cerrojos en forma concurrente y/o paralela aumenta la posibilidad de entrar en abrazo mortal (deadlock) o en un bloqueo activo (livelock) en el programa, está es una de las razones por lo cual programar en forma paralela resulta ser mucho mas dificultoso que programar en forma secuencial. La memoria transaccional (TM) es un paradigma prometedor para la programación paralela, que ofrece una alternativa a los cerrojos. La memoria transaccional tiene muchas ventajas desde el punto de vista tanto práctico como teórico. TM elimina el riesgo de bloqueo mutuo y de bloqueo activo, mientras que proporciona una semántica de atomicidad, coherencia, aislamiento con características similares a las secciones críticas. TM ejecuta especulativamente una serie de accesos a la memoria como una transacción atómica.
Los cambios especulativos de la transacción se mantienen privados hasta que se confirma la transacción. Si una transacción entra en conflicto con otra transacción o sea que alguna de ellas escribe en una dirección que la otra leyó o escribió, o se entra en un abrazo mortal o en un bloqueo activo, el sistema de TM aborta la transacción y revierte los cambios especulativos. Para ser eficaz, una implementación de TM debe proporcionar un alto rendimiento y escalabilidad. Las implementaciones de TM en el software (STM) no proporcionan este desempeño deseable, en cambio, las mplementaciones de TM en hardware (HTM) tienen mejor desempeño y una escalabilidad relativamente buena, debido a su mejor control de los recursos de hardware y que la resolución de los conflictos así el mantenimiento y gestión de los datos se hace en hardware. Sin embargo, muchos de los sistemas de HTM están limitados a los recursos de hardware disponibles, por ejemplo el tamaño de las caches privadas, y dependen de mecanismos de software para cuando esos límites son sobrepasados. Estos sistemas HTM, llamados best-effort HTM no son deseables, ya que obligan al programador a pensar en términos de los límites existentes en el hardware que se esta utilizando, así como en el sistema de STM que se llama cuando los recursos son sobrepasados. Además, tiene que resolver que transacciones hardware y software se ejecuten concurrentemente. En cambio, los sistemas
de HTM ilimitados soportan un numero de operaciones ilimitadas o sea no están restringidos a límites impuestos artificialmente por el hardware, como ser el tamaño de las caches o buffers internos. Los sistemas HTM ilimitados por lo general requieren protocolos complejos o mecanismos muy costosos para la detección de conflictos y el mantenimiento de versiones de los datos entre las transacciones. Por otra parte, la ejecución de transacciones es a menudo mucho más lenta que en una ejecución sobre un sistema de HTM que este limitado. Esto es debido al que los mecanismos utilizados en el HTM
limitado trabaja con conjuntos de datos relativamente pequeños que caben o están muy cerca del núcleo del procesador.
En esta tesis estudiamos implementaciones de TM en hardware. Presentaremos tres contribuciones principales: Primero, mejoramos el rendimiento general de los sistemas, al proponer un protocolo escalable para la gestión de conflictos. El protocolo detecta los conflictos de forma precisa, en contraste con otras técnicas basadas en filtros Bloom, que pueden reportar conflictos falsos entre las transacciones. Segundo, proponemos un best-effort HTM que utiliza el nuevo protocolo escalable detección de conflictos, denominado EazyHTM. EazyHTM permite la ejecución completamente paralela de todas las transacciones sin conflictos, y por lo general simplifica la ejecución. Por último, proponemos una extensión y mejora del protocolo inicial para la gestión de conflictos, que llamaremos EcoTM. EcoTM cuenta con detección de conflictos precisa, eficiente y es compatible tanto con transacciones grandes como con pequeñas. La idea clave de EcoTM es aprovechar la observación que en muy pocas ubicaciones de memoria aparecen los conflictos entre las transacciones, incluso en aplicaciones tienen muchos conflictos. En EcoTM, cada núcleo detecta localmente si la línea es conflictiva, además existe un mecanismo de detección de conflictos detallado que solo se activa para las pocas líneas de memoria que son potencialmente conflictivas
On the Performance of Software Transactional Memory
The recent proliferation of multi-core processors has moved concurrent programming into mainstream by forcing increasingly more programmers to write parallel code. Using traditional concurrency techniques, such as locking, is notoriously difficult and has been considered the domain of a few experts for a long time. This discrepancy between the established techniques and typical programmer's skills raises a pressing need for new programming paradigms. A particularly appealing concurrent programming paradigm is transactional memory: it enables programmers to write correct concurrent code in a simple manner, while promising scalable performance. Software implementations of transactional memory (STM) have attracted a lot of attention for their ability to support dynamic transactions of any size and execute on existing hardware. This is in contrast to hardware implementations that typically support only transactions of limited size and are not yet commercially available. Surprisingly, prior work has largely neglected software support for transactions of arbitrary size, despite them being an important target for STM. Consequently, existing STMs have not been optimized for large transactions, which results in poor performance of those STMs, and sometimes even program crashes, when dealing with large transactions. In this thesis, I contribute to changing the current state of affairs by improving performance and scalability of STM, in particular with dynamic transactions of arbitrary size. I propose SwissTM, a novel STM design that efficiently supports large transactions, while not compromising on performance with smaller ones. SwissTM features: (1) mixed conflict detection, that detects write-write conflicts eagerly and read-write conflicts lazily, and (2) a two-phase contention manager, that imposes little overhead on small transactions and effectively manages conflicts between larger ones. SwissTM indeed achieves good performance across a range of workloads: it outperforms several state-of-the-art STMs on a representative large-scale benchmark by at least 55% with eight threads, while matching their performance or outperforming them across a wide range of smaller-scale benchmarks. I also present a detailed empirical analysis of the SwissTM design, individually evaluating each of the chosen design points and their impact on performance. This "dissection" of SwissTM is particularly valuable for STM designers as it helps them understand which parts of the design are well-suited to their own STMs, enabling them to reuse just those parts. Furthermore, I address the question of whether STM can perform well enough to be practical by performing the most extensive comparison of performance of STM-based and sequential, non-thread-safe code to date. This comparison demonstrates the very fact that SwissTM indeed outperforms sequential code, often with just a handful of threads: with four threads it outperforms sequential code in 80% of cases, by up to 4x. Furthermore, the performance scales well when increasing thread counts: with 64 threads it outperforms sequential code by up to 29x. These results suggest that STM is indeed a viable alternative for writing concurrent code today
A semantics comparison workbench for a concurrent, asynchronous, distributed programming language
A number of high-level languages and libraries have been proposed that offer
novel and simple to use abstractions for concurrent, asynchronous, and
distributed programming. The execution models that realise them, however, often
change over time---whether to improve performance, or to extend them to new
language features---potentially affecting behavioural and safety properties of
existing programs. This is exemplified by SCOOP, a message-passing approach to
concurrent object-oriented programming that has seen multiple changes proposed
and implemented, with demonstrable consequences for an idiomatic usage of its
core abstraction. We propose a semantics comparison workbench for SCOOP with
fully and semi-automatic tools for analysing and comparing the state spaces of
programs with respect to different execution models or semantics. We
demonstrate its use in checking the consistency of properties across semantics
by applying it to a set of representative programs, and highlighting a
deadlock-related discrepancy between the principal execution models of SCOOP.
Furthermore, we demonstrate the extensibility of the workbench by generalising
the formalisation of an execution model to support recently proposed extensions
for distributed programming. Our workbench is based on a modular and
parameterisable graph transformation semantics implemented in the GROOVE tool.
We discuss how graph transformations are leveraged to atomically model
intricate language abstractions, how the visual yet algebraic nature of the
model can be used to ascertain soundness, and highlight how the approach could
be applied to similar languages.Comment: Accepted by Formal Aspects of Computin
Practical database replication
Tese de doutoramento em InformáticaSoftware-based replication is a cost-effective approach for fault-tolerance when combined with
commodity hardware. In particular, shared-nothing database clusters built upon commodity machines
and synchronized through eager software-based replication protocols have been driven by
the distributed systems community in the last decade.
The efforts on eager database replication, however, stem from the late 1970s with initial
proposals designed by the database community. From that time, we have the distributed locking
and atomic commitment protocols. Briefly speaking, before updating a data item, all copies
are locked through a distributed lock, and upon commit, an atomic commitment protocol is
responsible for guaranteeing that the transaction’s changes are written to a non-volatile storage
at all replicas before committing it. Both these processes contributed to a poor performance.
The distributed systems community improved these processes by reducing the number of interactions
among replicas through the use of group communication and by relaxing the durability
requirements imposed by the atomic commitment protocol. The approach requires at most two
interactions among replicas and disseminates updates without necessarily applying them before
committing a transaction. This relies on a high number of machines to reduce the likelihood of
failures and ensure data resilience. Clearly, the availability of commodity machines and their
increasing processing power makes this feasible.
Proving the feasibility of this approach requires us to build several prototypes and evaluate
them with different workloads and scenarios. Although simulation environments are a good starting
point, mainly those that allow us to combine real (e.g., replication protocols, group communication)
and simulated-code (e.g., database, network), full-fledged implementations should be
developed and tested. Unfortunately, database vendors usually do not provide native support for
the development of third-party replication protocols, thus forcing protocol developers to either
change the database engines, when the source code is available, or construct in the middleware
server wrappers that intercept client requests otherwise. The former solution is hard to maintain
as new database releases are constantly being produced, whereas the latter represents a strenuous
development effort as it requires us to rebuild several database features at the middleware.
Unfortunately, the group-based replication protocols, optimistic or conservative, that had
been proposed so far have drawbacks that present a major hurdle to their practicability. The
optimistic protocols make it difficult to commit transactions in the presence of hot-spots, whereas
the conservative protocols have a poor performance due to concurrency issues.
In this thesis, we propose using a generic architecture and programming interface, titled
GAPI, to facilitate the development of different replication strategies. The idea consists of providing key extensions to multiple DBMSs (Database Management Systems), thus enabling a
replication strategy to be developed once and tested on several databases that have such extensions,
i.e., those that are replication-friendly. To tackle the aforementioned problems in groupbased
replication protocols, we propose using a novel protocol, titled AKARA. AKARA guarantees
fairness, and thus all transactions have a chance to commit, and ensures great performance
while exploiting parallelism as provided by local database engines. Finally, we outline a simple
but comprehensive set of components to build group-based replication protocols and discuss key
points in its design and implementation.A replicação baseada em software é uma abordagem que fornece um bom custo benefício para
tolerância a falhas quando combinada com hardware commodity. Em particular, os clusters de
base de dados “shared-nothing” construídos com hardware commodity e sincronizados através de
protocolos “eager” têm sido impulsionados pela comunidade de sistemas distribuídos na última
década.
Os primeiros esforços na utilização dos protocolos “eager”, decorrem da década de 70 do
século XX com as propostas da comunidade de base de dados. Dessa época, temos os protocolos
de bloqueio distribuído e de terminação atómica (i.e. “two-phase commit”). De forma sucinta,
antes de actualizar um item de dados, todas as cópias são bloqueadas através de um protocolo
de bloqueio distribuído e, no momento de efetivar uma transacção, um protocolo de terminação
atómica é responsável por garantir que as alterações da transacção são gravadas em todas as
réplicas num sistema de armazenamento não-volátil. No entanto, ambos os processos contribuem
para um mau desempenho do sistema.
A comunidade de sistemas distribuídos melhorou esses processos, reduzindo o número de
interacções entre réplicas, através do uso da comunicação em grupo e minimizando a rigidez
os requisitos de durabilidade impostos pelo protocolo de terminação atómica. Essa abordagem
requer no máximo duas interacções entre as réplicas e dissemina actualizações sem necessariamente
aplicá-las antes de efectivar uma transacção. Para funcionar, a solução depende de um
elevado número de máquinas para reduzirem a probabilidade de falhas e garantir a resiliência de
dados. Claramente, a disponibilidade de hardware commodity e o seu poder de processamento
crescente tornam essa abordagem possível.
Comprovar a viabilidade desta abordagem obriga-nos a construir vários protótipos e a avaliálos
com diferentes cargas de trabalho e cenários. Embora os ambientes de simulação sejam um
bom ponto de partida, principalmente aqueles que nos permitem combinar o código real (por
exemplo, protocolos de replicação, a comunicação em grupo) e o simulado (por exemplo, base
de dados, rede), implementações reais devem ser desenvolvidas e testadas. Infelizmente, os
fornecedores de base de dados, geralmente, não possuem suporte nativo para o desenvolvimento
de protocolos de replicação de terceiros, forçando os desenvolvedores de protocolo a mudar o
motor de base de dados, quando o código fonte está disponível, ou a construir no middleware
abordagens que interceptam as solicitações do cliente. A primeira solução é difícil de manter já
que novas “releases” das bases de dados estão constantemente a serem produzidas, enquanto a
segunda representa um desenvolvimento árduo, pois obriga-nos a reconstruir vários recursos de
uma base de dados no middleware. Infelizmente, os protocolos de replicação baseados em comunicação em grupo, optimistas ou
conservadores, que foram propostos até agora apresentam inconvenientes que são um grande obstáculo
à sua utilização. Com os protocolos optimistas é difícil efectivar transacções na presença
de “hot-spots”, enquanto que os protocolos conservadores têm um fraco desempenho devido a
problemas de concorrência.
Nesta tese, propomos utilizar uma arquitetura genérica e uma interface de programação, intitulada
GAPI, para facilitar o desenvolvimento de diferentes estratégias de replicação. A ideia
consiste em fornecer extensões chaves para múltiplos SGBDs (Database Management Systems),
permitindo assim que uma estratégia de replicação possa ser desenvolvida uma única vez e testada
em várias bases de dados que possuam tais extensões, ou seja, aquelas que são “replicationfriendly”.
Para resolver os problemas acima referidos nos protocolos de replicação baseados
em comunicação em grupo, propomos utilizar um novo protocolo, intitulado AKARA. AKARA
garante a equidade, portanto, todas as operações têm uma oportunidade de serem efectivadas,
e garante um excelente desempenho ao tirar partido do paralelismo fornecido pelos motores
de base de dados. Finalmente, propomos um conjunto simples, mas abrangente de componentes
para construir protocolos de replicação baseados em comunicação em grupo e discutimos pontoschave
na sua concepção e implementação
Dynamic re-optimization techniques for stream processing engines and object stores
Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures
Opacity: A Correctness Condition for Transactional Memory
Transactional memory is perceived as an appealing alternative to critical sections for general purpose concurrent programming. Despite the large amount of recent work on transactional memory implementations, however, its actual specification has never been precisely defined. This paper presents \emph{opacity}, a new correctness criterion for transactional memory systems. Opacity extends the notion of strict serializability, itself a strong form of the classical serializability property, with the requirement that even \emph{non-committed} transactions are prevented from accessing inconsistent state. Yet opacity does not preclude versioning, invisible reads and lazy updates, often used by modern TM implementations. In fact, most transactional memory systems we know of ensure opacity. We prove a tight bound on the inherent cost of implementing opacity. The bound highlights a trade-off that explains some of the differences between current transactional memory systems, and also draws a sharp complexity line between opacity on one hand, and the combination of strict serializability and strict recoverability on the other hand
Doctor of Philosophy
dissertationAlmost all high performance computing applications are written in MPI, which will continue to be the case for at least the next several years. Given the huge and growing importance of MPI, and the size and sophistication of MPI codes, scalable and incisive MPI debugging tools are essential. Existing MPI debugging tools have, despite their strengths, many glaring de ficiencies, especially when it comes to debugging under the presence of nondeterminism related bugs, which are bugs that do not always show up during testing. These bugs usually become manifest when the systems are ported to di fferent platforms for production runs. This dissertation focuses on the problem of developing scalable dynamic verifi cation tools for MPI programs that can provide a coverage guarantee over the space of MPI nondeterminism. That is, the tools should be able to detect diff erent outcomes of nondeterministic events in an MPI program and enforce all those di fferent outcomes through repeated executions of the program with the same test harness. We propose to achieve the coverage guarantee by introducing efficient distributed causality tracking protocols that are based on the matches-before order. The matches-before order is introduced to address the shortcomings of the Lamport happens-before order [40], which is not sufficient to capture causality for MPI program executions due to the complexity of the MPI semantics. The two protocols we propose are the Lazy Lamport Clocks Protocol (LLCP) and the Lazy Vector Clocks Protocol (LVCP). LLCP provides good scalability with a small possibility of missing potential outcomes of nondeterministic events while LVCP provides full coverage guarantee with a scalability tradeoff . In practice, we show through our experiments that LLCP provides the same coverage as LVCP. This thesis makes the following contributions: •The MPI matches-before order that captures the causality between MPI events in an MPI execution. • Two distributed causality tracking protocols for MPI programs that rely on the matches-before order. • A Distributed Analyzer for MPI programs (DAMPI), which implements the two aforementioned protocols to provide scalable and modular dynamic verifi cation for MPI programs. • Scalability enhancement through algorithmic improvements for ISP, a dynamic verifi er for MPI programs
SUPPORTING MULTIPLE ISOLATION LEVELS IN REPLICATED ENVIRONMENTS
La replicación de bases de datos aporta fiabilidad y escalabilidad aunque hacerlo
de forma transparente no es una tarea sencilla. Una base de datos replicada es
transparente si puede reemplazar a una base de datos centralizada tradicional sin
que sea necesario adaptar el resto de componentes del sistema. La transparencia
en bases de datos replicadas puede obtenerse siempre que (a) la gestión de la
replicación quede totalmente oculta a dichos componentes y (b) se ofrezca la
misma funcionalidad que en una base de datos tradicional.
Para mejorar el rendimiento general del sistema, los gestores de bases de datos
centralizadas actuales permiten ejecutar de forma concurrente transacciones
bajo distintos niveles de aislamiento. Por ejemplo, la especificación del benchmark
TPC-C permite la ejecución de algunas transacciones con niveles de aislamiento
débiles. No obstante, este soporte todavía no está disponible en los
protocolos de replicación. En esta tesis mostramos cómo estos protocolos pueden
ser extendidos para permitir la ejecución de transacciones con distintos niveles
de aislamiento.Bernabe Gisbert, JM. (2014). SUPPORTING MULTIPLE ISOLATION LEVELS IN REPLICATED ENVIRONMENTS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/36535TESI
An Analytical Approach to Programs as Data Objects
This essay accompanies a selection of 32 articles (referred to in bold face in the text and marginally marked in the bibliographic references) submitted to Aarhus University towards a Doctor Scientiarum degree in Computer Science.The author's previous academic degree, beyond a doctoral degree in June 1986, is an "Habilitation à diriger les recherches" from the Université Pierre et Marie Curie (Paris VI) in France; the corresponding material was submitted in September 1992 and the degree was obtained in January 1993.The present 32 articles have all been written since 1993 and while at DAIMI.Except for one other PhD student, all co-authors are or have been the author's students here in Aarhus
- …