166 research outputs found

    ORLease: Optimistically Replicated Lease Using Lease Version Vector For Higher Replica Consistency in Optimistic Replication Systems

    Get PDF
    There is a tradeoff between the availability and consistency properties of any distributed replication system. Optimistic replication favors high availability over strong consistency so that the replication system can support disconnected replicas as well as high network latency between replicas. Optimistic replication improves the availability of these systems by allowing data updates to be committed at their originating replicas first before they are asynchronously replicated out and committed later at the rest of the replicas. This leads the whole system to suffer from a relaxed data consistency. This is due to the lack of any locking mechanism to synchronize access to the replicated data resources in order to mutually exclude one another. When consistency is relaxed, there is a potential of reading from stale data as well as introducing data conflicts due to the concurrent data updates that might have been introduced at different replicas. These issues could be ameliorated if the optimistic replication system is aggressively propagating the data updates at times of good network connectivity between replicas. However, aggressive propagation for data updates does not scale well in write intensive environments and leads to communication overhead in order to keep all replicas in sync. In pursuance of a solution to mitigate the relaxed consistency drawback, a new technique has been developed that improves the consistency of optimistic replication systems without sacrificing its availability and with minimal communication overhead. This new methodology is based on applying the concurrency control technique of leasing in an optimistic way. The optimistic lease technique is built on top of a replication framework that prioritizes metadata replication over data replication. The framework treats the lease requests as replication metadata updates and replicates them aggressively in order to optimistically acquire leases on replicated data resources. The technique is demonstrating a best effort semi-locking semantics that improves the overall system consistency while avoiding any locking issues that could arise in optimistic replication systems

    A Transactional Model and Platform for Designing and Implementing Reactive Systems

    Get PDF
    A reactive program is one that has ongoing interactions with its environment. Reactive programs include those for embedded systems, operating systems, network clients and servers, databases, and smart phone apps. Reactive programs are already a core part of our computational and physical infrastructure and will continue to proliferate within our society as new form factors, e.g. wireless sensors, and inexpensive (wireless) networking are applied to new problems. Asynchronous concurrency is a fundamental characteristic of reactive systems that makes them difficult to develop. Threads are commonly used for implementing reactive systems, but they may magnify problems associated with asynchronous concurrency, as there is a gap between the semantics of thread-based computation and the semantics of reactive systems: reactive software developed with threads often has subtle timing bugs and tends to be brittle and non-reusable as a holistic understanding of the software becomes necessary to avoid concurrency hazards such as data races, deadlock, and livelock. Based on these problems with the state of the art, we believe a new model for developing and implementing reactive systems is necessary. This dissertation makes four contributions to the state of the art in reactive systems. First, we propose a formal yet practical model for (asynchronous) reactive systems called reactive components. A reactive component is a set of state variables and atomic transitions that can be composed with other reactive components to yield another reactive component. The transitions in a system of reactive components are executed by a scheduler. The reactive component model is based on concepts from temporal logic and models like UNITY and I/O Automata. The major contribution of the reactive component model is a formal method for principled composition, which ensures that 1) the result of composition is always another reactive component, for consistency of reasoning; 2) systems may be decomposed to an arbitrary degree and depth, to foster divide-and-conquer approaches when designing and re-use when implementing; 3)~the behavior of a reactive component can be stated in terms of its interface, which is necessary for abstraction; and 4) properties of reactive components that are derived from transitions protected by encapsulation are preserved through composition and can never be violated, which permits assume-guarantee reasoning. Second, we develop a prototypical programming language for reactive components called rcgo that is based on the syntax and semantics of the Go programming language. The semantics of the rcgo language enforce various aspects of the reactive component model, e.g., the isolation of state between components and safety of concurrency properties, while permitting a number of useful programming techniques, e.g., reference and move semantics for efficient communication among reactive components. For tractability, we assume that each system contains a fixed set of components in a fixed configuration. Third, we provide an interpreter for the rcgo language to test the practicality of the assumptions upon which the reactive component model are founded. The interpreter contains an algorithm that checks for composition hazards like recursively defined transitions and non-deterministic transitions. Transitions are executed using a novel calling convention that can be implemented efficiently on existing architectures. The run-time system also contains two schedulers that use the results of composition analysis to execute non-interfering transitions concurrently. Fourth, we compare the performance of each scheduler in the interpreter to the performance of a custom compiled multi-threaded program, for two reactive systems. For one system, the combination of the implementation and hardware biases it toward an event-based solution, which was confirmed when the reactive component implementation outperformed the custom implementation due to reduced context switching. For the other system, the custom implementation is not prone to excessive context switches and outperformed the reactive component implementations. These results demonstrate that reactive components may be a viable alternative to threads in practice, but that additional work is necessary to generalize this claim

    A speculative execution approach to provide semantically aware contention management for concurrent systems

    Get PDF
    PhD ThesisMost modern platforms offer ample potention for parallel execution of concurrent programs yet concurrency control is required to exploit parallelism while maintaining program correctness. Pessimistic con- currency control featuring blocking synchronization and mutual ex- clusion, has given way to transactional memory, which allows the composition of concurrent code in a manner more intuitive for the application programmer. An important component in any transactional memory technique however is the policy for resolving conflicts on shared data, commonly referred to as the contention management policy. In this thesis, a Universal Construction is described which provides contention management for software transactional memory. The technique differs from existing approaches given that multiple execution paths are explored speculatively and in parallel. In the resolution of conflicts by state space exploration, we demonstrate that both concur- rent conflicts and semantic conflicts can be solved, promoting multi- threaded program progression. We de ne a model of computation called Many Systems, which defines the execution of concurrent threads as a state space management problem. An implementation is then presented based on concepts from the model, and we extend the implementation to incorporate nested transactions. Results are provided which compare the performance of our approach with an established contention management policy, under varying degrees of concurrent and semantic conflicts. Finally, we provide performance results from a number of search strategies, when nested transactions are introduced

    CoAP Infrastructure for IoT

    Get PDF
    The Internet of Things (IoT) can be seen as a large-scale network of billions of smart devices. Often IoT devices exchange data in small but numerous messages, which requires IoT services to be more scalable and reliable than ever. Traditional protocols that are known in the Web world does not fit well in the constrained environment that these devices operate in. Therefore many lightweight protocols specialized for the IoT have been studied, among which the Constrained Application Protocol (CoAP) stands out for its well-known REST paradigm and easy integration with existing Web. On the other hand, new paradigms such as Fog Computing emerges, attempting to avoid the centralized bottleneck in IoT services by moving computations to the edge of the network. Since a node of the Fog essentially belongs to relatively constrained environment, CoAP fits in well. Among the many attempts of building scalable and reliable systems, Erlang as a typical concurrency-oriented programming (COP) language has been battle tested in the telecom industry, which has similar requirements as the IoT. In order to explore the possibility of applying Erlang and COP in general to the IoT, this thesis presents an Erlang based CoAP server/client prototype ecoap with a flexible concurrency model that can scale up to an unconstrained environment like the Cloud and scale down to a constrained environment like an embedded platform. The flexibility of the presented server renders the same architecture applicable from Fog to Cloud. To evaluate its performance, the proposed server is compared with the mainstream CoAP implementation on an Amazon Web Service (AWS) Cloud instance and a Raspberry Pi 3, representing the unconstrained and constrained environment respectively. The ecoap server achieves comparable throughput, lower latency, and in general scales better than the other implementation in the Cloud and on the Raspberry Pi. The thesis yields positive results and demonstrates the value of the philosophy of Erlang in the IoT space

    Optimizing recovery protocols for replicated database systems

    Full text link
    En la actualidad, el uso de tecnologías de informacíon y sistemas de cómputo tienen una gran influencia en la vida diaria. Dentro de los sistemas informáticos actualmente en uso, son de gran relevancia los sistemas distribuidos por la capacidad que pueden tener para escalar, proporcionar soporte para la tolerancia a fallos y mejorar el desempeño de aplicaciones y proporcionar alta disponibilidad. Los sistemas replicados son un caso especial de los sistemas distribuidos. Esta tesis está centrada en el área de las bases de datos replicadas debido al uso extendido que en el presente se hace de ellas, requiriendo características como: bajos tiempos de respuesta, alto rendimiento en los procesos, balanceo de carga entre las replicas, consistencia e integridad de datos y tolerancia a fallos. En este contexto, el desarrollo de aplicaciones utilizando bases de datos replicadas presenta dificultades que pueden verse atenuadas mediante el uso de servicios de soporte a mas bajo nivel tales como servicios de comunicacion y pertenencia. El uso de los servicios proporcionados por los sistemas de comunicación de grupos permiten ocultar los detalles de las comunicaciones y facilitan el diseño de protocolos de replicación y recuperación. En esta tesis, se presenta un estudio de las alternativas y estrategias empleadas en los protocolos de replicación y recuperación en las bases de datos replicadas. También se revisan diferentes conceptos sobre los sistemas de comunicación de grupos y sincronia virtual. Se caracterizan y clasifican diferentes tipos de protocolos de replicación con respecto a la interacción o soporte que pudieran dar a la recuperación, sin embargo el enfoque se dirige a los protocolos basados en sistemas de comunicación de grupos. Debido a que los sistemas comerciales actuales permiten a los programadores y administradores de sistemas de bases de datos renunciar en alguna medida a la consistencia con la finalidad de aumentar el rendimiento, es importante determinar el nivel de consistencia necesario. En el caso de las bases de datos replicadas la consistencia está muy relacionada con el nivel de aislamiento establecido entre las transacciones. Una de las propuestas centrales de esta tesis es un protocolo de recuperación para un protocolo de replicación basado en certificación. Los protocolos de replicación de base de datos basados en certificación proveen buenas bases para el desarrollo de sus respectivos protocolos de recuperación cuando se utiliza el nivel de aislamiento snapshot. Para tal nivel de aislamiento no se requiere que los readsets sean transferidos entre las réplicas ni revisados en la fase de cetificación y ya que estos protocolos mantienen un histórico de la lista de writesets que es utilizada para certificar las transacciones, este histórico provee la información necesaria para transferir el estado perdido por la réplica en recuperación. Se hace un estudio del rendimiento del protocolo de recuperación básico y de la versión optimizada en la que se compacta la información a transferir. Se presentan los resultados obtenidos en las pruebas de la implementación del protocolo de recuperación en el middleware de soporte. La segunda propuesta esta basada en aplicar el principio de compactación de la informacion de recuperación en un protocolo de recuperación para los protocolos de replicación basados en votación débil. El objetivo es minimizar el tiempo necesario para transfeir y aplicar la información perdida por la réplica en recuperación obteniendo con esto un protocolo de recuperación mas eficiente. Se ha verificado el buen desempeño de este algoritmo a través de una simulación. Para efectuar la simulación se ha hecho uso del entorno de simulación Omnet++. En los resultados de los experimentos puede apreciarse que este protocolo de recuperación tiene buenos resultados en múltiples escenarios. Finalmente, se presenta la verificación de la corrección de ambos algoritmos de recuperación en el Capítulo 5.Nowadays, information technology and computing systems have a great relevance on our lives. Among current computer systems, distributed systems are one of the most important because of their scalability, fault tolerance, performance improvements and high availability. Replicated systems are a specific case of distributed system. This Ph.D. thesis is centered in the replicated database field due to their extended usage, requiring among other properties: low response times, high throughput, load balancing among replicas, data consistency, data integrity and fault tolerance. In this scope, the development of applications that use replicated databases raises some problems that can be reduced using other fault-tolerant building blocks, as group communication and membership services. Thus, the usage of the services provided by group communication systems (GCS) hides several communication details, simplifying the design of replication and recovery protocols. This Ph.D. thesis surveys the alternatives and strategies being used in the replication and recovery protocols for database replication systems. It also summarizes different concepts about group communication systems and virtual synchrony. As a result, the thesis provides a classification of database replication protocols according to their support to (and interaction with) recovery protocols, always assuming that both kinds of protocol rely on a GCS. Since current commercial DBMSs allow that programmers and database administrators sacrifice consistency with the aim of improving performance, it is important to select the appropriate level of consistency. Regarding (replicated) databases, consistency is strongly related to the isolation levels being assigned to transactions. One of the main proposals of this thesis is a recovery protocol for a replication protocol based on certification. Certification-based database replication protocols provide a good basis for the development of their recovery strategies when a snapshot isolation level is assumed. In that level readsets are not needed in the validation step. As a result, they do not need to be transmitted to other replicas. Additionally, these protocols hold a writeset list that is used in the certification/validation step. That list maintains the set of writesets needed by the recovery protocol. This thesis evaluates the performance of a recovery protocol based on the writeset list tranfer (basic protocol) and of an optimized version that compacts the information to be transferred. The second proposal applies the compaction principle to a recovery protocol designed for weak-voting replication protocols. Its aim is to minimize the time needed for transferring and applying the writesets lost by the recovering replica, obtaining in this way an efficient recovery. The performance of this recovery algorithm has been checked implementing a simulator. To this end, the Omnet++ simulating framework has been used. The simulation results confirm that this recovery protocol provides good results in multiple scenarios. Finally, the correction of both recovery protocols is also justified and presented in Chapter 5.García Muñoz, LH. (2013). Optimizing recovery protocols for replicated database systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31632TESI

    TM2C: A software transactional memory for many-cores

    Get PDF
    Transactional memory is an appealing paradigm for concurrent programming. Many software implementations of the paradigm were proposed in the last decades for both shared memory multi-core systems and clusters of distributed machines. However, chip manufacturers have started producing many-core architectures, with low network-on-chip communication latency and limited support for cache-coherence, rendering existing transactional memory implementations inapplicable. This paper presents TM2C, the first software Transactional Memory protocol for Many-Core systems. TM2C exploits network-on-chip communications to get granted accesses to shared data through efficient message passing. In particular, it allows visible read accesses and hence effective distributed contention management with eager conflict detection. We also propose FairCM, a companion contention manager that ensures starvation-freedom, which we believe is an important property in many-core systems, as well as an implementation of elastic transactions in these settings. Our evaluation on four benchmarks, i.e., a linked list and a hash table data structures as well as a bank and a MapReduce-like applications, indicates better scalability than locks and up to 20-fold speedup (relative to bare sequential code) when running 24 application cores. © 2012 ACM

    Towards lightweight and high-performance hardware transactional memory

    Get PDF
    Conventional lock-based synchronization serializes accesses to critical sections guarded by the same lock. Using multiple locks brings the possibility of a deadlock or a livelock in the program, making parallel programming a difficult task. Transactional Memory (TM) is a promising paradigm for parallel programming, offering an alternative to lock-based synchronization. TM eliminates the risk of deadlocks and livelocks, while it provides the desirable semantics of Atomicity, Consistency, and Isolation of critical sections. TM speculatively executes a series of memory accesses as a single, atomic, transaction. The speculative changes of a transaction are kept private until the transaction commits. If a transaction can break the atomicity or cause a deadlock or livelock, the TM system aborts the transaction and rolls back the speculative changes. To be effective, a TM implementation should provide high performance and scalability. While implementations of TM in pure software (STM) do not provide desirable performance, Hardware TM (HTM) implementations introduce much smaller overhead and have relatively good scalability, due to their better control of hardware resources. However, many HTM systems support only the transactions that fit limited hardware resources (for example, private caches), and fall back to software mechanisms if hardware limits are reached. These HTM systems, called best-effort HTMs, are not desirable since they force a programmer to think in terms of hardware limits, to use both HTM and STM, and to manage concurrent transactions in HTM and STM. In contrast with best-effort HTMs, unbounded HTM systems support overflowed transactions, that do not fit into private caches. Unbounded HTM systems often require complex protocols or expensive hardware mechanisms for conflict detection between overflowed transactions. In addition, an execution with overflowed transactions is often much slower than an execution that has only regular transactions. This is typically due to restrictive or approximative conflict management mechanism used for overflowed transactions. In this thesis, we study hardware implementations of transactional memory, and make three main contributions. First, we improve the general performance of HTM systems by proposing a scalable protocol for conflict management. The protocol has precise conflict detection, in contrast with often-employed inexact Bloom-filter-based conflict detection, which often falsely report conflicts between transactions. Second, we propose a best-effort HTM that utilizes the new scalable conflict detection protocol, termed EazyHTM. EazyHTM allows parallel commits for all non-conflicting transactions, and generally simplifies transaction commits. Finally, we propose an unbounded HTM that extends and improves the initial protocol for conflict management, and we name it EcoTM. EcoTM features precise conflict detection, and it efficiently supports large as well as small and short transactions. The key idea of EcoTM is to leverage an observation that very few locations are actually conflicting, even if applications have high contention. In EcoTM, each core locally detects if a cache line is non-conflicting, and conflict detection mechanism is invoked only for the few potentially conflicting cache lines.La Sincronización tradicional basada en los cerrojos de exclusión mutua (locks) serializa los accesos a las secciones críticas protegidas este cerrojo. La utilización de varios cerrojos en forma concurrente y/o paralela aumenta la posibilidad de entrar en abrazo mortal (deadlock) o en un bloqueo activo (livelock) en el programa, está es una de las razones por lo cual programar en forma paralela resulta ser mucho mas dificultoso que programar en forma secuencial. La memoria transaccional (TM) es un paradigma prometedor para la programación paralela, que ofrece una alternativa a los cerrojos. La memoria transaccional tiene muchas ventajas desde el punto de vista tanto práctico como teórico. TM elimina el riesgo de bloqueo mutuo y de bloqueo activo, mientras que proporciona una semántica de atomicidad, coherencia, aislamiento con características similares a las secciones críticas. TM ejecuta especulativamente una serie de accesos a la memoria como una transacción atómica. Los cambios especulativos de la transacción se mantienen privados hasta que se confirma la transacción. Si una transacción entra en conflicto con otra transacción o sea que alguna de ellas escribe en una dirección que la otra leyó o escribió, o se entra en un abrazo mortal o en un bloqueo activo, el sistema de TM aborta la transacción y revierte los cambios especulativos. Para ser eficaz, una implementación de TM debe proporcionar un alto rendimiento y escalabilidad. Las implementaciones de TM en el software (STM) no proporcionan este desempeño deseable, en cambio, las mplementaciones de TM en hardware (HTM) tienen mejor desempeño y una escalabilidad relativamente buena, debido a su mejor control de los recursos de hardware y que la resolución de los conflictos así el mantenimiento y gestión de los datos se hace en hardware. Sin embargo, muchos de los sistemas de HTM están limitados a los recursos de hardware disponibles, por ejemplo el tamaño de las caches privadas, y dependen de mecanismos de software para cuando esos límites son sobrepasados. Estos sistemas HTM, llamados best-effort HTM no son deseables, ya que obligan al programador a pensar en términos de los límites existentes en el hardware que se esta utilizando, así como en el sistema de STM que se llama cuando los recursos son sobrepasados. Además, tiene que resolver que transacciones hardware y software se ejecuten concurrentemente. En cambio, los sistemas de HTM ilimitados soportan un numero de operaciones ilimitadas o sea no están restringidos a límites impuestos artificialmente por el hardware, como ser el tamaño de las caches o buffers internos. Los sistemas HTM ilimitados por lo general requieren protocolos complejos o mecanismos muy costosos para la detección de conflictos y el mantenimiento de versiones de los datos entre las transacciones. Por otra parte, la ejecución de transacciones es a menudo mucho más lenta que en una ejecución sobre un sistema de HTM que este limitado. Esto es debido al que los mecanismos utilizados en el HTM limitado trabaja con conjuntos de datos relativamente pequeños que caben o están muy cerca del núcleo del procesador. En esta tesis estudiamos implementaciones de TM en hardware. Presentaremos tres contribuciones principales: Primero, mejoramos el rendimiento general de los sistemas, al proponer un protocolo escalable para la gestión de conflictos. El protocolo detecta los conflictos de forma precisa, en contraste con otras técnicas basadas en filtros Bloom, que pueden reportar conflictos falsos entre las transacciones. Segundo, proponemos un best-effort HTM que utiliza el nuevo protocolo escalable detección de conflictos, denominado EazyHTM. EazyHTM permite la ejecución completamente paralela de todas las transacciones sin conflictos, y por lo general simplifica la ejecución. Por último, proponemos una extensión y mejora del protocolo inicial para la gestión de conflictos, que llamaremos EcoTM. EcoTM cuenta con detección de conflictos precisa, eficiente y es compatible tanto con transacciones grandes como con pequeñas. La idea clave de EcoTM es aprovechar la observación que en muy pocas ubicaciones de memoria aparecen los conflictos entre las transacciones, incluso en aplicaciones tienen muchos conflictos. En EcoTM, cada núcleo detecta localmente si la línea es conflictiva, además existe un mecanismo de detección de conflictos detallado que solo se activa para las pocas líneas de memoria que son potencialmente conflictivas
    corecore