38 research outputs found

    Towards a Software Transactional Memory for heterogeneous CPU-GPU processors

    Get PDF
    The heterogeneous Accelerated Processing Units (APUs) integrate a multi-core CPU and a GPU within the same chip. Modern APUs provide the programmer with platform atomics, used to communicate the CPU cores with the GPU using simple atomic datatypes. However, ensuring consistency for complex data types is a task delegated to programmers, who have to implement a mutual exclusion mechanism. Transactional Memory (TM) is an optimistic approach to implement mutual exclusion. With TM, shared data can be accessed by multiple computing threads speculatively, but changes are only visible if a transaction ends with no conflict with others in its memory accesses. TM has been studied and implemented in software and hardware for both CPU and GPU platforms, but an integrated solution has not been provided for APU processors. In this paper we present APUTM, a software TM designed to work on heterogeneous APU processors. The design of APUTM focuses on minimizing the access to shared metadata in order to reduce the communication overhead via expensive platform atomics. The main objective of APUTM is to help us understand the tradeoffs of implementing a sofware TM on an heterogeneous CPU-GPU platform and to identify the key aspects to be considered in each device. In our experiments, we compare the adaptability of APUTM to execute in one of the devices (CPU or GPU) or in both of them simultaneously. These experiments show that APUTM is able to outperform sequential execution of the applications.This work has been supported by projects TIN2013-42253-P and TIN2016-80920-R, from the Spanish Government, P11-TIC8144 and P12- TIC1470, from Junta de Andalucía, and Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Improvements in Hardware Transactional Memory for GPU Architectures

    Get PDF
    In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Hardware support for Local Memory Transactions on GPU Architectures

    Get PDF
    Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction - Multiple Thread (SIMT) fashion. However, the SIMT execution model is not efficient when code includes critical sections to protect the access to data shared by the running threads. In addition, GPUs offer two shared spaces to the threads, local memory and global memory. Typical solutions to thread synchronization include the use of atomics to implement locks, the serialization of the execution of the critical section, or delegating the execution of the critical section to the host CPU, leading to suboptimal performance. In the multi-core CPU world, transactional memory (TM) was proposed as an alternative to locks to coordinate concurrent threads. Some solutions for GPUs started to appear in the literature. In contrast to these earlier proposals, our approach is to design hardware support for TM in two levels. The first level is a fast and lightweight solution for coordinating threads that share the local memory, while the second level coordinates threads through the global memory. In this paper we present GPU-LocalTM as a hardware TM (HTM) support for the first level. GPU-LocalTM offers simple conflict detection and version management mechanisms that minimize the hardware resources required for its implementation. For the workloads studied, GPU-LocalTM provides between 1.25-80X speedup over serialized critical sections, while the overhead introduced by transaction management is lower than 20%.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Energy Efficiency of Software Transactional Memory in a Heterogeneous Architecture

    Get PDF
    Hardware vendors make an important effort creating low-power CPUs that keep battery duration and durability above acceptable levels. In order to achieve this goal and provide good performance-energy for a wide variety of applications, ARM designed the big.LITTLE architecture. This heterogeneous multi-core architecture features two different types of cores: big cores oriented to performance and little cores, slower and aimed to save energy consumption. As all the cores have access to the same memory, multi-threaded applications must resort to some mutual exclusion mechanism to coordinate the access to shared data by the concurrent threads. Transactional Memory (TM) represents an optimistic approach for shared-memory synchronization. To take full advantage of the features offered by software TM, but also benefit from the characteristics of the heterogeneous big.LITTLE architectures, our focus is to propose TM solutions that take into account the power/performance requirements of the application and what it is offered by the architecture. In order to understand the current state-of-the-art and obtain useful information for future power-aware software TM solutions, we have performed an analysis of a popular TM library running on top of an ARM big.LITTLE processor. Experiments show, in general, better scalability for the LITTLE cores for most of the applications except for one, which requires the computing performance that the big cores offer.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Memoria Transaccional Software en Procesadores CPU+GPU Heterogéneos

    Get PDF
    En los procesadores multi-núcleo, la memoria transaccional (TM) ha aparecido como una alternativa prometedora a las técnicas basadas en cerrojos para garantizar exclusión mutua y está siendo incluida como parte de procesadores comerciales. De igual forma, dado que las GPUs se están convirtiendo en el acelerador más popular de la actualidad, los fabricantes están integrándolas dentro del mismo chip, creando las llamadas APUs (Accelerated Processing Units). Sin embargo, la sincronización entre CPU y GPU aún se lleva a cabo con mecanismos muy simples basados en operaciones atómicas y señales. Por tanto, es responsabilidad de los programadores implementar técnicas más avanzadas de exclusión mútua. Las técnicas basadas en TM aún no han sido explotadas en este tipo de procesadores y, por tanto, es importante hacer propuestas de sincronización avanzadas. En este artículo proponemos una librería de TM software enfocada a su uso en procesadores APU. El objetivo es que las transacciones puedan ejecutarse tanto en CPU como en GPU simultáneamente y que se permita la sincronización en forma de exclusión mutua entre ambos dispositivos. Nuestra propuesta, llamada APUTM, se enfoca en minimizar la comunicación entre la CPU y la GPU de los metadatos requeridos para manejar TM. La evaluación de esta propuesta muestra que, utilizando este mecanismo de sincronización, es posible mejorar el tiempo de ejecución de las aplicaciones secuenciales con un reducido esfuerzo en la programación

    Memoria Transaccional Hardware en Memoria Local de GPU

    Get PDF
    Los aceleradores gráficos (GPUs) se han convertido en procesadores de prop ́osito general muy populares para el cómputo de aplicaciones que presen- tan un gran paralelismo de datos. Su modelo de ejecución SIMT (Single Instruction - Multiple Thread) y su jerarquía de memoria son piezas clave en la alta eficiencia de estas arquitecturas, que permiten el manejo de cientos o miles de hilos de ejecución. La jerarquía de memoria está dividida en dos espacios direccionables: Una memoria local, pequeña, rápida y visible por un subconjunto de los hilos en ejecución; y una memoria global, mayor, más lenta y visible por todos los hilos. Sin embargo, el modelo de programación SIMT no es eficiente cuando hay que sincronizar este desbordante número de hilos para garantizar exclusión mútua en una sección crítica. Utilizar atómicos para implementar cerrojos es problemático e ineficiente en este tipo de modelo de programación. La memoria transaccional (TM) ha sido propuesta como una alternativa más fiable y eficiente que los cerrojos para esta sincronización. Con TM, se permite el acceso especulativo a la sección crítica, registrando los accesos a memoria, deshaciendo los cambios de aquellos hilos que han tenido un conflicto y reiniciando su ejecución. En este trabajo presentamos una solución TM hardware que sincroniza aquellos hilos de ejecución que comparten la memoria local. En las pruebas realizadas, el uso de TM permite conseguir aceleraciones superiores a las soluciones basadas en cerrojos de grano grueso, así como igualar a aquellas basadas en cerrojos de grano fino, pero con un menor esfuerzo de programación.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Hardware support for scratchpad memory transactions on GPU architectures

    Get PDF
    Graphics Processing Units (GPUs) have become the accelerator of choice for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction - Multiple Thread (SIMT) fashion. Using OpenCL terminology, GPUs offer a global memory space shared by all the threads in the GPU, as well as a low-latency local memory space shared by a subset of the threads. The latter is used as a scratchpad to improve the performance of the applications. We propose GPU-LocalTM, a hardware transactional memory (TM), as an alternative to data locking mechanisms in local memory. GPU-LocalTM allocates transactional metadata in the existing memory resources, minimizing the storage requirements for TM support. In addition, it ensures forward progress through an automatic serialization mechanism. In our experiments, GPU-LocalTM provides up to 100X speedup over serialized execution.This work has been supported by projects TIN2013-42253-P and TIN2016-80920-R, from the Spanish Government, P11-TIC8144 and P12-TIC1470, from Junta de Andalucía, and Universidad de Málaga, Campus de Excelencia Internacional, Andalucía Tech

    Cytokine profile in peripheral blood mononuclear cells differs between embryo donor and potential recipient sows

    Get PDF
    IntroductionPregnancy success relies on the establishment of a delicate immune balance that requires the early activation of a series of local and systemic immune mechanisms. The changes in the immunological profile that are normally occurring in the pregnant uterus does not take place in cyclic (non-pregnant) uterus, a fact that has been widely explored in pigs at the tissue local level. Such differences would be especially important in the context of embryo transfer (ET), where a growing body of literature indicates that immunological differences at the uterine level between donors and recipients may significantly impact embryonic mortality. However, whether components of peripheral immunity also play a role in this context remains unknown. Accordingly, our hypothesis is that the immune status of donor sows differs from potential recipients, not only at the tissue local level but also at the systemic level. These differences could contribute to the high embryonic mortality rates occurring in ET programs.MethodsIn this study differences in systemic immunity, based on cytokine gene expression profile in peripheral blood mononuclear cells (PBMCs), between embryo-bearing donor (DO group; N = 10) and potential recipient sows (RE group; N = 10) at Day 6 after the onset of the estrus were explored. Gene expression analysis was conducted for 6 proinflammatory (IL-1α, IL-1β, IL-2, GM-CSF, IFN-γ, and TNF-α) and 6 anti-inflammatory (IL-4, IL-6, IL-10, IL-13, TGF-β1, and LIF) cytokines.Results and discussionAll cytokines were overexpressed in the DO group except for IL-4, suggesting that stimuli derived from the insemination and/or the resultant embryos modify the systemic immune profile in DO sows compared to RE (lacking these stimuli). Our results also suggest that certain cytokines (e.g., IL-1α and IL-1β) might have a predictive value for the pregnancy status

    The Athena X-ray Integral Field Unit: a consolidated design for the system requirement review of the preliminary definition phase

    Get PDF
    The Athena X-ray Integral Unit (X-IFU) is the high resolution X-ray spectrometer studied since 2015 for flying in the mid-30s on the Athena space X-ray Observatory. Athena is a versatile observatory designed to address the Hot and Energetic Universe science theme, as selected in November 2013 by the Survey Science Committee. Based on a large format array of Transition Edge Sensors (TES), X-IFU aims to provide spatially resolved X-ray spectroscopy, with a spectral resolution of 2.5 eV (up to 7 keV) over a hexagonal field of view of 5 arc minutes (equivalent diameter). The X-IFU entered its System Requirement Review (SRR) in June 2022, at about the same time when ESA called for an overall X-IFU redesign (including the X-IFU cryostat and the cooling chain), due to an unanticipated cost overrun of Athena. In this paper, after illustrating the breakthrough capabilities of the X-IFU, we describe the instrument as presented at its SRR (i.e. in the course of its preliminary definition phase, so-called B1), browsing through all the subsystems and associated requirements. We then show the instrument budgets, with a particular emphasis on the anticipated budgets of some of its key performance parameters, such as the instrument efficiency, spectral resolution, energy scale knowledge, count rate capability, non X-ray background and target of opportunity efficiency. Finally, we briefly discuss the ongoing key technology demonstration activities, the calibration and the activities foreseen in the X-IFU Instrument Science Center, touch on communication and outreach activities, the consortium organisation and the life cycle assessment of X-IFU aiming at minimising the environmental footprint, associated with the development of the instrument. Thanks to the studies conducted so far on X-IFU, it is expected that along the design-to-cost exercise requested by ESA, the X-IFU will maintain flagship capabilities in spatially resolved high resolution X-ray spectroscopy, enabling most of the original X-IFU related scientific objectives of the Athena mission to be retained. The X-IFU will be provided by an international consortium led by France, The Netherlands and Italy, with ESA member state contributions from Belgium, Czech Republic, Finland, Germany, Poland, Spain, Switzerland, with additional contributions from the United States and Japan.The French contribution to X-IFU is funded by CNES, CNRS and CEA. This work has been also supported by ASI (Italian Space Agency) through the Contract 2019-27-HH.0, and by the ESA (European Space Agency) Core Technology Program (CTP) Contract No. 4000114932/15/NL/BW and the AREMBES - ESA CTP No.4000116655/16/NL/BW. This publication is part of grant RTI2018-096686-B-C21 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”. This publication is part of grant RTI2018-096686-B-C21 and PID2020-115325GB-C31 funded by MCIN/AEI/10.13039/501100011033

    The Athena X-ray Integral Field Unit: a consolidated design for the system requirement review of the preliminary definition phase

    Full text link
    The Athena X-ray Integral Unit (X-IFU) is the high resolution X-ray spectrometer, studied since 2015 for flying in the mid-30s on the Athena space X-ray Observatory, a versatile observatory designed to address the Hot and Energetic Universe science theme, selected in November 2013 by the Survey Science Committee. Based on a large format array of Transition Edge Sensors (TES), it aims to provide spatially resolved X-ray spectroscopy, with a spectral resolution of 2.5 eV (up to 7 keV) over an hexagonal field of view of 5 arc minutes (equivalent diameter). The X-IFU entered its System Requirement Review (SRR) in June 2022, at about the same time when ESA called for an overall X-IFU redesign (including the X-IFU cryostat and the cooling chain), due to an unanticipated cost overrun of Athena. In this paper, after illustrating the breakthrough capabilities of the X-IFU, we describe the instrument as presented at its SRR, browsing through all the subsystems and associated requirements. We then show the instrument budgets, with a particular emphasis on the anticipated budgets of some of its key performance parameters. Finally we briefly discuss on the ongoing key technology demonstration activities, the calibration and the activities foreseen in the X-IFU Instrument Science Center, and touch on communication and outreach activities, the consortium organisation, and finally on the life cycle assessment of X-IFU aiming at minimising the environmental footprint, associated with the development of the instrument. Thanks to the studies conducted so far on X-IFU, it is expected that along the design-to-cost exercise requested by ESA, the X-IFU will maintain flagship capabilities in spatially resolved high resolution X-ray spectroscopy, enabling most of the original X-IFU related scientific objectives of the Athena mission to be retained. (abridged).Comment: 48 pages, 29 figures, Accepted for publication in Experimental Astronomy with minor editin
    corecore