10 research outputs found

    Customer application protocol for data transfer between embedded processor and microcontroller systems

    Get PDF
    This paper develops a new customer application protocol (CAP) to improve the efficiency of transferring data between embedded processor and microcontroller systems. The established protocol is characterized by its fidelity and simplicity for using a small header to control and monitor the data flow between the two systems. This is achieved by constructing an embedded processor system with an Ethernet intellectual property (IP) core featured by lightweight IP (lwIP) to settle a connection with a microcontroller device. The embedded system is configured on spartan6E FPGAs slice. The system performance is tested by transferring audio samples and displaying them on chipscope media. The performance test of the designed embedded system with the developed customer application protocol showed fast, efficient and high precision data exchange between the processor and microcontroller systems

    Towards lightweight and high-performance hardware transactional memory

    Get PDF
    Conventional lock-based synchronization serializes accesses to critical sections guarded by the same lock. Using multiple locks brings the possibility of a deadlock or a livelock in the program, making parallel programming a difficult task. Transactional Memory (TM) is a promising paradigm for parallel programming, offering an alternative to lock-based synchronization. TM eliminates the risk of deadlocks and livelocks, while it provides the desirable semantics of Atomicity, Consistency, and Isolation of critical sections. TM speculatively executes a series of memory accesses as a single, atomic, transaction. The speculative changes of a transaction are kept private until the transaction commits. If a transaction can break the atomicity or cause a deadlock or livelock, the TM system aborts the transaction and rolls back the speculative changes. To be effective, a TM implementation should provide high performance and scalability. While implementations of TM in pure software (STM) do not provide desirable performance, Hardware TM (HTM) implementations introduce much smaller overhead and have relatively good scalability, due to their better control of hardware resources. However, many HTM systems support only the transactions that fit limited hardware resources (for example, private caches), and fall back to software mechanisms if hardware limits are reached. These HTM systems, called best-effort HTMs, are not desirable since they force a programmer to think in terms of hardware limits, to use both HTM and STM, and to manage concurrent transactions in HTM and STM. In contrast with best-effort HTMs, unbounded HTM systems support overflowed transactions, that do not fit into private caches. Unbounded HTM systems often require complex protocols or expensive hardware mechanisms for conflict detection between overflowed transactions. In addition, an execution with overflowed transactions is often much slower than an execution that has only regular transactions. This is typically due to restrictive or approximative conflict management mechanism used for overflowed transactions. In this thesis, we study hardware implementations of transactional memory, and make three main contributions. First, we improve the general performance of HTM systems by proposing a scalable protocol for conflict management. The protocol has precise conflict detection, in contrast with often-employed inexact Bloom-filter-based conflict detection, which often falsely report conflicts between transactions. Second, we propose a best-effort HTM that utilizes the new scalable conflict detection protocol, termed EazyHTM. EazyHTM allows parallel commits for all non-conflicting transactions, and generally simplifies transaction commits. Finally, we propose an unbounded HTM that extends and improves the initial protocol for conflict management, and we name it EcoTM. EcoTM features precise conflict detection, and it efficiently supports large as well as small and short transactions. The key idea of EcoTM is to leverage an observation that very few locations are actually conflicting, even if applications have high contention. In EcoTM, each core locally detects if a cache line is non-conflicting, and conflict detection mechanism is invoked only for the few potentially conflicting cache lines.La Sincronizaci贸n tradicional basada en los cerrojos de exclusi贸n mutua (locks) serializa los accesos a las secciones cr铆ticas protegidas este cerrojo. La utilizaci贸n de varios cerrojos en forma concurrente y/o paralela aumenta la posibilidad de entrar en abrazo mortal (deadlock) o en un bloqueo activo (livelock) en el programa, est谩 es una de las razones por lo cual programar en forma paralela resulta ser mucho mas dificultoso que programar en forma secuencial. La memoria transaccional (TM) es un paradigma prometedor para la programaci贸n paralela, que ofrece una alternativa a los cerrojos. La memoria transaccional tiene muchas ventajas desde el punto de vista tanto pr谩ctico como te贸rico. TM elimina el riesgo de bloqueo mutuo y de bloqueo activo, mientras que proporciona una sem谩ntica de atomicidad, coherencia, aislamiento con caracter铆sticas similares a las secciones cr铆ticas. TM ejecuta especulativamente una serie de accesos a la memoria como una transacci贸n at贸mica. Los cambios especulativos de la transacci贸n se mantienen privados hasta que se confirma la transacci贸n. Si una transacci贸n entra en conflicto con otra transacci贸n o sea que alguna de ellas escribe en una direcci贸n que la otra ley贸 o escribi贸, o se entra en un abrazo mortal o en un bloqueo activo, el sistema de TM aborta la transacci贸n y revierte los cambios especulativos. Para ser eficaz, una implementaci贸n de TM debe proporcionar un alto rendimiento y escalabilidad. Las implementaciones de TM en el software (STM) no proporcionan este desempe帽o deseable, en cambio, las mplementaciones de TM en hardware (HTM) tienen mejor desempe帽o y una escalabilidad relativamente buena, debido a su mejor control de los recursos de hardware y que la resoluci贸n de los conflictos as铆 el mantenimiento y gesti贸n de los datos se hace en hardware. Sin embargo, muchos de los sistemas de HTM est谩n limitados a los recursos de hardware disponibles, por ejemplo el tama帽o de las caches privadas, y dependen de mecanismos de software para cuando esos l铆mites son sobrepasados. Estos sistemas HTM, llamados best-effort HTM no son deseables, ya que obligan al programador a pensar en t茅rminos de los l铆mites existentes en el hardware que se esta utilizando, as铆 como en el sistema de STM que se llama cuando los recursos son sobrepasados. Adem谩s, tiene que resolver que transacciones hardware y software se ejecuten concurrentemente. En cambio, los sistemas de HTM ilimitados soportan un numero de operaciones ilimitadas o sea no est谩n restringidos a l铆mites impuestos artificialmente por el hardware, como ser el tama帽o de las caches o buffers internos. Los sistemas HTM ilimitados por lo general requieren protocolos complejos o mecanismos muy costosos para la detecci贸n de conflictos y el mantenimiento de versiones de los datos entre las transacciones. Por otra parte, la ejecuci贸n de transacciones es a menudo mucho m谩s lenta que en una ejecuci贸n sobre un sistema de HTM que este limitado. Esto es debido al que los mecanismos utilizados en el HTM limitado trabaja con conjuntos de datos relativamente peque帽os que caben o est谩n muy cerca del n煤cleo del procesador. En esta tesis estudiamos implementaciones de TM en hardware. Presentaremos tres contribuciones principales: Primero, mejoramos el rendimiento general de los sistemas, al proponer un protocolo escalable para la gesti贸n de conflictos. El protocolo detecta los conflictos de forma precisa, en contraste con otras t茅cnicas basadas en filtros Bloom, que pueden reportar conflictos falsos entre las transacciones. Segundo, proponemos un best-effort HTM que utiliza el nuevo protocolo escalable detecci贸n de conflictos, denominado EazyHTM. EazyHTM permite la ejecuci贸n completamente paralela de todas las transacciones sin conflictos, y por lo general simplifica la ejecuci贸n. Por 煤ltimo, proponemos una extensi贸n y mejora del protocolo inicial para la gesti贸n de conflictos, que llamaremos EcoTM. EcoTM cuenta con detecci贸n de conflictos precisa, eficiente y es compatible tanto con transacciones grandes como con peque帽as. La idea clave de EcoTM es aprovechar la observaci贸n que en muy pocas ubicaciones de memoria aparecen los conflictos entre las transacciones, incluso en aplicaciones tienen muchos conflictos. En EcoTM, cada n煤cleo detecta localmente si la l铆nea es conflictiva, adem谩s existe un mecanismo de detecci贸n de conflictos detallado que solo se activa para las pocas l铆neas de memoria que son potencialmente conflictivas

    On the Performance of Software Transactional Memory

    Get PDF
    The recent proliferation of multi-core processors has moved concurrent programming into mainstream by forcing increasingly more programmers to write parallel code. Using traditional concurrency techniques, such as locking, is notoriously difficult and has been considered the domain of a few experts for a long time. This discrepancy between the established techniques and typical programmer's skills raises a pressing need for new programming paradigms. A particularly appealing concurrent programming paradigm is transactional memory: it enables programmers to write correct concurrent code in a simple manner, while promising scalable performance. Software implementations of transactional memory (STM) have attracted a lot of attention for their ability to support dynamic transactions of any size and execute on existing hardware. This is in contrast to hardware implementations that typically support only transactions of limited size and are not yet commercially available. Surprisingly, prior work has largely neglected software support for transactions of arbitrary size, despite them being an important target for STM. Consequently, existing STMs have not been optimized for large transactions, which results in poor performance of those STMs, and sometimes even program crashes, when dealing with large transactions. In this thesis, I contribute to changing the current state of affairs by improving performance and scalability of STM, in particular with dynamic transactions of arbitrary size. I propose SwissTM, a novel STM design that efficiently supports large transactions, while not compromising on performance with smaller ones. SwissTM features: (1) mixed conflict detection, that detects write-write conflicts eagerly and read-write conflicts lazily, and (2) a two-phase contention manager, that imposes little overhead on small transactions and effectively manages conflicts between larger ones. SwissTM indeed achieves good performance across a range of workloads: it outperforms several state-of-the-art STMs on a representative large-scale benchmark by at least 55% with eight threads, while matching their performance or outperforming them across a wide range of smaller-scale benchmarks. I also present a detailed empirical analysis of the SwissTM design, individually evaluating each of the chosen design points and their impact on performance. This "dissection" of SwissTM is particularly valuable for STM designers as it helps them understand which parts of the design are well-suited to their own STMs, enabling them to reuse just those parts. Furthermore, I address the question of whether STM can perform well enough to be practical by performing the most extensive comparison of performance of STM-based and sequential, non-thread-safe code to date. This comparison demonstrates the very fact that SwissTM indeed outperforms sequential code, often with just a handful of threads: with four threads it outperforms sequential code in 80% of cases, by up to 4x. Furthermore, the performance scales well when increasing thread counts: with 64 threads it outperforms sequential code by up to 29x. These results suggest that STM is indeed a viable alternative for writing concurrent code today

    Design and evaluation of a Thread-Level Speculation runtime library

    Get PDF
    En los pr贸ximos a帽os es m谩s que probable que m谩quinas con cientos o incluso miles de procesadores sean algo habitual. Para aprovechar estas m谩quinas, y debido a la dificultad de programar de forma paralela, ser铆a deseable disponer de sistemas de compilaci贸n o ejecuci贸n que extraigan todo el paralelismo posible de las aplicaciones existentes. As铆 en los 煤ltimos tiempos se han propuesto multitud de t茅cnicas paralelas. Sin embargo, la mayor铆a de ellas se centran en c贸digos simples, es decir, sin dependencias entre sus instrucciones. La paralelizaci贸n especulativa surge como una soluci贸n para estos c贸digos complejos, posibilitando la ejecuci贸n de cualquier tipo de c贸digos, con o sin dependencias. Esta t茅cnica asume de forma optimista que la ejecuci贸n paralela de cualquier tipo de c贸digo no de lugar a errores y, por lo tanto, necesitan de un mecanismo que detecte cualquier tipo de colisi贸n. Para ello, constan de un monitor responsable que comprueba constantemente que la ejecuci贸n no sea err贸nea, asegurando que los resultados obtenidos de forma paralela sean similares a los de cualquier ejecuci贸n secuencial. En caso de que la ejecuci贸n fuese err贸nea los threads se detendr铆an y reiniciar铆an su ejecuci贸n para asegurar que la ejecuci贸n sigue la sem谩ntica secuencial. Nuestra contribuci贸n en este campo incluye (1) una nueva librer铆a de ejecuci贸n especulativa f谩cil de utilizar; (2) nuevas propuestas que permiten reducir de forma significativa el n煤mero de accesos requeridos en las peraciones especulativas, as铆 como consejos para reducir la memoria a utilizar; (3) propuestas para mejorar los m茅todos de scheduling centradas en la gesti贸n din谩mica de los bloques de iteraciones utilizados en las ejecuciones especulativas; (4) una soluci贸n h铆brida que utiliza memoria transaccional para implementar las secciones cr铆ticas de una librer铆a de paralelizaci贸n especulativa; y (5) un an谩lisis de las t茅cnicas especulativas en uno de los dispositivos m谩s vanguardistas del momento, los coprocesadores Intel Xeon Phi. Como hemos podido comprobar, la paralelizaci贸n especulativa es un campo de investigaci贸n activo. Nuestros resultados demuestran que esta t茅cnica permite obtener mejoras de rendimiento en un gran n煤mero de aplicaciones. As铆, esperamos que este trabajo contribuya a facilitar el uso de soluciones especulativas en compiladores comerciales y/o modelos de programaci贸n paralela de memoria compartida.Departamento de Inform谩tica (Arquitectura y Tecnolog铆a de Computadores, Ciencias de la Computaci贸n e Inteligencia Artificial, Lenguajes y Sistemas Inform谩ticos

    Software Transactional Memory Building Blocks

    Get PDF
    Exploiting thread-level parallelism has become a part of mainstream programming in recent years. Many approaches to parallelization require threads executing in parallel to also synchronize occassionally (i.e., coordinate concurrent accesses to shared state). Transactional Memory (TM) is a programming abstraction that provides the concept of database transactions in the context of programming languages such as C/C++. This allows programmers to only declare which pieces of a program synchronize without requiring them to actually implement synchronization and tune its performance, which in turn makes TM typically easier to use than other abstractions such as locks. I have investigated and implemented the building blocks that are required for a high-performance, practical, and realistic TM. They host several novel algorithms and optimizations for TM implementations, both for current hardware and future hardware extensions for TM, and are being used in or have influenced commercial TM implementations such as the TM support in GCC

    The velox transactional memory stack

    No full text
    The transactional memory programming paradigm could become the coordination methodology of choice for actual and future multicore and many-core architectures. The transactional memory support spans a complete software and hardware stack, including programming language and hardware support, runtime and libraries, compilers, and application environments. The VELOX project has developed such a comprehensive transactional memory stack.Peer Reviewe
    corecore