8 research outputs found

    Energy reduction in multiprocessor systems using transactional memory

    Get PDF

    Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs

    Get PDF
    Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM) can provide benefits on both power saving and the overall applications’ execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application’s execution. We have selected a set of self-adaptive extensions to existing STM middlewares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptivity is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all

    Exploiting software transactional memory in the context of asymmetric architectures

    Get PDF
    Orientador: Paulo Cesar CentoducatteTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A adoção dos microprocessadores com múltiplos núcleos de execução pela indústria semicondutora tem criado uma crescente necessidade por novas linguagens, metodologias e ferramentas que tornem o desenvolvimento de sistemas concorrentes mais rápido, eficiente e acessível aos programadores de todos os níveis. Uma das principais dificuldades em programação concorrente com memória compartilhada é garantir a correta sincronização do código, evitando assim condições de corrida que podem levar o sistema a um estado inconsistente. A sincronização tem sido tradicionalmente realizada através de métodos baseados em travas, reconhecidos amplamente por serem de difícil uso e pelas anomalias causadas. Um novo mecanismo, conhecido como memória transacional (TM), tem sido alvo de muita pesquisa recentemente e promete simplificar o processo de sincronização, além de possibilitar maior oportunidade para extração de paralelismo e consequente desempenho. O cerne desta tese é formado por três trabalhos desenvolvidos no contexto dos sistemas de memória transacional em software (STM). Primeiramente, apresentamos uma implementação de STM para processadores assimétricos, usando a arquitetura Cell/B.E. como foco. Como principal resultado, constatamos que o uso de sistemas transacionais em arquiteturas assimétricas também é promissor, principalmente pelo fator escalabilidade. No segundo trabalho, adotamos uma abordagem diferente e sugerimos um sistema de STM especialmente voltado para o domínio de jogos computacionais. O principal motivo que nos levou nesta direção é o baixo desempenho das implementações atuais de STM. Um estudo de caso conduzido a partir de um jogo complexo mostra a eficácia do sistema proposto. Finalmente, apresentamos pela primeira vez uma caracterização do consumo de energia de um sistema de STM considerado estado da arte. Além da caracterização, também propomos uma técnica para redução do consumo em casos de alta contenção. Resultados obtidos a partir dessa técnica revelam ganhos de até 87% no consumo de energiaAbstract: The shift towards multicore processors taken by the semiconductor industry has initiated an era in which new languages, methodologies and tools are of paramount importance to the development of efficient concurrent systems that can be built in a timely way by all kinds of programmers. One of the main obstacles faced by programmers when dealing with shared memory programming concerns the use of synchronization mechanisms so as to avoid race conditions that could possibly lead the system to an inconsistent state. Synchronization has been traditionally achieved by means of locks (or variations thereof), widely known by their anomalies and hard-to-get-it-right facets. A new mechanism, known as transactional memory (TM), has recently been the focus of a lot of research and shows potential to simplify code synchronization as well as delivering more parallelism and, therefore, better performance. This thesis presents three works focused on different aspects of software transactional memory (STM) systems. Firstly, we show an STM implementation for asymmetric processors, focusing on the architecture of Cell/B.E. As an important result, we find out that memory transactions are indeed promising for asymmetric architectures, specially due to their scalability. Secondly, we take a different approach to STM implementation by devising a system specially targeted at computer games. The decision was guided by poor performance figures usually seen on current STM implementations. We also conduct a case study using a complex game that effectively shows the system's efficiency. Finally, we present the energy consumption characterization of a state-of-the-art STM for the first time. Based on the observed characterization, we also propose a technique aimed at reducing energy consumption in highly contended scenarios. Our results show that the technique is indeed effective in such cases, improving the energy consumption by up to 87%DoutoradoSistemas de ComputaçãoDoutor em Ciência da Computaçã

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Software Approaches to Manage Resource Tradeoffs of Power and Energy Constrained Applications

    Get PDF
    Power and energy efficiency have become an increasingly important design metric for a wide spectrum of computing devices. Battery efficiency, which requires a mixture of energy and power efficiency, is exceedingly important especially since there have been no groundbreaking advances in battery capacity recently. The need for energy and power efficiency stretches from small embedded devices to portable computers to large scale data centers. The projected future of computing demand, referred to as exascale computing, demands that researchers find ways to perform exaFLOPs of computation at a power bound much lower than would be required by simply scaling today's standards. There is a large body of work on power and energy efficiency for a wide range of applications and at different levels of abstraction. However, there is a lack of work studying the nuances of different tradeoffs that arise when operating under a power/energy budget. Moreover, there is no work on constructing a generalized model of applications running under power/energy constraints, which allows the designer to optimize their resource consumption, be it power, energy, time, bandwidth, or space. There is need for an efficient model that can provide bounds on the optimality of an application's resource consumption, becoming a basis against which online resource management heuristics can be measured. In this thesis, we tackle the problem of managing resource tradeoffs of power/energy constrained applications. We begin by studying the nuances of power/energy tradeoffs with the response time and throughput of stream processing applications. We then study the power performance tradeoff of batch processing applications to identify a power configuration that maximizes performance under a power bound. Next, we study the tradeoff of power/energy with network bandwidth and precision. Finally, we study how to combine tradeoffs into a generalized model of applications running under resource constraints. The work in this thesis presents detailed studies of the power/energy tradeoff with response time, throughput, performance, network bandwidth, and precision of stream and batch processing applications. To that end, we present an adaptive algorithm that manages stream processing tradeoffs of response time and throughput at the CPU level. At the task-level, we present an online heuristic that adaptively distributes bounded power in a cluster to improve performance, as well as an offline approach to optimally bound performance. We demonstrate how power can be used to reduce bandwidth bottlenecks and extend our offline approach to model bandwidth tradeoffs. Moreover, we present a tool that identifies parts of a program that can be downgraded in precision with minimal impact on accuracy, and maximal impact on energy consumption. Finally, we combine all the above tradeoffs into a flexible model that is efficient to solve and allows for bounding and/or optimizing the consumption of different resources

    Designs for increasing reliability while reducing energy and increasing lifetime

    Get PDF
    In the last decades, the computing technology experienced tremendous developments. For instance, transistors' feature size shrank to half at every two years as consistently from the first time Moore stated his law. Consequently, number of transistors and core count per chip doubles at each generation. Similarly, petascale systems that have the capability of processing more than one billion calculation per second have been developed. As a matter of fact, exascale systems are predicted to be available at year 2020. However, these developments in computer systems face a reliability wall. For instance, transistor feature sizes are getting so small that it becomes easier for high-energy particles to temporarily flip the state of a memory cell from 1-to-0 or 0-to-1. Also, even if we assume that fault-rate per transistor stays constant with scaling, the increase in total transistor and core count per chip will significantly increase the number of faults for future desktop and exascale systems. Moreover, circuit ageing is exacerbated due to increased manufacturing variability and thermal stresses, therefore, lifetime of processor structures are becoming shorter. On the other side, due to the limited power budget of the computer systems such that mobile devices, it is attractive to scale down the voltage. However, when the voltage level scales to beyond the safe margin especially to the ultra-low level, the error rate increases drastically. Nevertheless, new memory technologies such as NAND flashes present only limited amount of nominal lifetime, and when they exceed this lifetime, they can not guarantee storing of the data correctly leading to data retention problems. Due to these issues, reliability became a first-class design constraint for contemporary computing in addition to power and performance. Moreover, reliability even plays increasingly important role when computer systems process sensitive and life-critical information such as health records, financial information, power regulation, transportation, etc. In this thesis, we present several different reliability designs for detecting and correcting errors occurring in processor pipelines, L1 caches and non-volatile NAND flash memories due to various reasons. We design reliability solutions in order to serve three main purposes. Our first goal is to improve the reliability of computer systems by detecting and correcting random and non-predictable errors such as bit flips or ageing errors. Second, we aim to reduce the energy consumption of the computer systems by allowing them to operate reliably at ultra-low voltage level. Third, we target to increase the lifetime of new memory technologies by implementing efficient and low-cost reliability schemes

    Energy reduction in multiprocessor systems using transactional memory

    No full text
    The emphasis in microprocessor design has shifted from high performance, to a combination of high performance and low power. Until recently, this trend was mostly true for uniprocessors. In this work we focus on new energy consumption issues unique to multiprocessor systems: synchronization of accesses to shared memory. We investigate and compare different means of providing atomic access to shared memory, including locks and lock-free synchronization (i.e., transactional memory), with respect to energy as well as performance. We show that transactional memory has an advantage in terms of energy consumption over locks, but that this advantage largely depends on the system architecture, the contention level, and the policy of conflict resolution

    On The Energy-efficiency Of Software Transactional Memory

    No full text
    Traditional software transactional memory designs are targeted towards performance and therefore little is known about their impact on energy consumption. We provide, in this paper, a comprehensive energy analysis of a standard STM design and propose novel scratchpad-based energy-aware STM design strategies. Experimental results collected through a state-of-the-art MPSoC simulation infrastructure show that our approach can achieve an energy improvement of up to 36% with regard to the base STM for applications characterized by short-lived transactions and relatively high abort rate. Copyright 2009 ACM.ACM SIGDA,Sociedade Brasileira de Computacao, SBC,IEEE Circuits and Systems Society, CAS,IEEE,ifipBanakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., Marwedel, P., Scratchpad memory: Design alternative for cache on-chip memory in embedded systems (2002) Proc. of CODES/ISSSBarroso, L.A., Holzle, U., The case for energy-proportional computing (2007) Computer, 40 (12), pp. 33-37Dice, D., Shalev, O., Shavit, N., Transactional locking II (2006) Proc. of the 20th DISCFelber, P., Fetzer, C., Riegel, T., Dynamic performance tuning of word-based software transactional memory (2008) Proc. of the 13th PPoPP, pp. 237-246Ferri, C., Viescas, A., Moreshet, T., Bahar, R.I., Herlihy, M., Energy efficient synchronization techniques for embedded architectures (2008) Proc. of the 18th GLSVLSI, pp. 435-440Harris, T., Cristal, A., Unsal, O., Ayguade, E., Gagliardi, F., Smith, B., Valero, M., Transactional memory: An overview (2007) IEEE Micro, 27 (3), pp. 8-29Larus, J.R., Rajwar, R., (2007) Transactional Memory, , Morgan & Claypool PublishersLi, J., Martinez, J.F., Huang, M.C., The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors (2004) Proc. of the HPCA, pp. 14-23Loghi, M., Poncino, M., Benini, L., Cycle-accurate power analysis for multiprocessor systems-on-a-chip (2004) Proc. of the 14th GLSVLSI, pp. 410-406Loghi, M., Poncino, M., Benini, L., Cache coherence tradeoffs in shared-memory MPSoCs (2006) ACM TECS, 5 (2), pp. 383-407Macii, A., Benini, L., Poncino, M., (2002) Memory Design Techniques for Low Energy Embedded SystemsMinh, C.C., Chung, J.W., Kozyrakis, C., Olukotun, K., STAMP: Stanford transactional applications for multi-processing (2008) Proc. of the IEEE IISWC, pp. 35-46Monchiero, M., Palermo, G., Silvano, C., Villa, O., Power/performance hardware optimization for synchronization intensive applications in MPSoCs (2006) Proc. of DATE, pp. 606-611Moreshet, T., Bahar, R.I., Herlihy, M., Energy reduction in multiprocessor systems using transactional memory (2005) Proc. of ISLPEDPark, S., Jiang, W., Zhou, Y., Adve, S., Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures (2007) Proc. of the ACM SIGMETRICSSaha, B., Adl-Tabatabai, A.-R., Hudson, R.L., Minh, C.C., Hertzberg, B., McRT-STM: A high performance software transactional memory system for a multi-core runtime (2006) Proc. of the PPoPPSutter, H., Larus, J.R., Software and the concurrency revolution (2005) Queue, 3 (7), pp. 54-62Udayakumaran, S., Dominguez, A., Barua, R., Dynamic allocation for scratch-pad memory using compile-time decisions (2006) TECS, 5 (2), pp. 472-511Verma, M., Marwedel, P., (2007) Advanced Memory Optimization Techniques for Low-Power Embedded Processor
    corecore