7 research outputs found

    Per-task energy accounting in computing systems

    Get PDF
    We present for the first time the concept of per-task energy accounting (PTEA) and relate it to per-task energy metering (PTEM). We show the benefits of supporting both in future computing systems. Using the shared last-level cache (LLC) as an example: (1) We illustrate the complexities in providing PTEM and PTEA; (2) we present an idealized PTEM model and an accurate and low-cost implementation of it; and (3) we introduce a hardware mechanism to provide accurate PTEA in the cache.Postprint (published version

    Per-task energy metering and accounting in the multicore era

    Get PDF
    Energy has become arguably the most expensive resource in a computing system. As multi-core processors are the preferred processing platform across different computing domains, measuring the energy usage draws vast attention. In this thesis, for the first time, we formalize the need for per-task energy measurement in multicore by establishing a two-fold concept: per-task energy metering and sensible energy accounting. The former, for a task running in a multi-core system, provides estimates on the actual energy consumption corresponding to its resource usage. The latter provides estimates on the energy the task would have consumed running in isolation with a given fraction of the shared resources. We have shown how these two concepts can be applied to the main components of a computing system: the processor and the memory system

    DReAM: An approach to estimate per-Task DRAM energy in multicore systems

    Get PDF
    Accurate per-task energy estimation in multicore systems would allow performing per-task energy-aware task scheduling and energy-aware billing in data centers, among other applications. Per-task energy estimation is challenged by the interaction between tasks in shared resources, which impacts tasks’ energy consumption in uncontrolled ways. Some accurate mechanisms have been devised recently to estimate per-task energy consumed on-chip in multicores, but there is a lack of such mechanisms for DRAM memories. This article makes the case for accurate per-task DRAM energy metering in multicores, which opens new paths to energy/performance optimizations. In particular, the contributions of this article are (i) an ideal per-task energy metering model for DRAM memories; (ii) DReAM, an accurate yet low cost implementation of the ideal model (less than 5% accuracy error when 16 tasks share memory); and (iii) a comparison with standard methods (even distribution and access-count based) proving that DReAM is much more accurate than these other methods.Peer ReviewedPostprint (author's final draft

    SEDEA: A sensible approach to account DRAM energy in multicore systems

    Get PDF
    As the energy cost in todays computing systems keeps increasing, measuring the energy becomes crucial in many scenarios. For instance, due to the fact that the operational cost of datacenters largely depends on the energy consumed by the applications executed, end users should be charged for the energy consumed, which requires a fair and consistent energy measuring approach. However, the use of multicore system complicates per-task energy measurement as the increased Thread Level Parallelism (TLP) allows several tasks to run simultaneously sharing resources. Therefore, the energy usage of each task is hard to determine due to interleaved activities and mutual interferences. To this end, Per-Task Energy Metering (PTEM) has been proposed to measure the actual energy of each task based on their resource utilization in a workload. However, the measured energy depends on the interferences from co-running tasks sharing the resources, and thus fails to provide the consistency across executions. Therefore, Sensible Energy Accounting (SEA) has been proposed to deliver an abstraction of the energy consumption based on a particular allocation of resources to a task.In this work we provide a realization of SEA for the DRAM memory system, SEDEA, where we account a task for the DRAM energy it would have consumed when running in isolation with a fraction of the on-chip shared cache. SEDEA is a mechanism to sensibly account for the DRAM energy of a task based on predicting its memory behavior. Our results show that SEDEA provides accurate estimates, yet with low-cost, beating existing per-task energy models, which do not target accounting energy in multicore system. We also provide a use case showing that SEDEA can be used to guide shared cache and memory bank partition schemes to save energy.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253) and National Key R&D Program of China under No.2016YFB1000204, by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the IBM-BSC Deep Learning Center initiative. Also by the major scientific and technological project of Guangdong province (2014B010115003), and NSFC under grant no 61702495, 61672511. M. Moret´o has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI- 2012-15047. J. Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717Peer ReviewedPostprint (author's final draft

    Improving the efficiency of multicore systems through software and hardware cooperation

    Get PDF
    Increasing processors' clock frequency has traditionally been one of the largest drivers of performance improvements for computing systems. In the first half of the 2000s, however, it became clear that continuing to increase frequency was not a viable solution anymore. Power consumption and power density became prohibitively costly, and processor manufacturers moved to multicore designs. This new paradigm introduced multiple challenges not present in single-threaded processors. Applications running on multicore systems share different resources such as the cache hierarchy and the memory bus. Resource sharing occurs at much finer degree when cores support multithreading as well. In this case, applications share the processor¿s pipeline too. Running multiple applications on the same processor allows for better utilization of its resources¿which otherwise may just lie idle if an application does not use them. But sharing resources may create interferences between applications running on the system. While the degree of these interferences depends on the nature of the applications, it is typically desirable to reduce them in order to improve efficiency. Most currently available processors expose a set of sensors and actuators that software can use to monitor and control resource sharing among the applications running on a system. But it is typically up to end users to analyze their workloads of interest and to manually use the actuators provided by the processor. Because of this, in many cases the different mechanisms for controlling resource sharing are simply left unused. In this thesis we present different techniques that rely on software/hardware interaction to monitor and improve application interference¿and thus improve system efficiency. First we conduct a quantitative study showing the benefits of hardware/software cooperation on system efficiency. Then we narrow our focus on a given hardware knob: data prefetching. Specifically we develop and evaluate several adaptive solutions for improving the efficiency of hardware data prefetching on multicore systems. The impact of the solutions presented in this thesis, however, goes beyond the particular case of data prefetching. They serve as illustrative examples for developing software/hardware cooperation schemes that enable the efficient sharing of resources in multicore systems.L'increment de la freqüència dels processadors ha estat tradicionalment un dels majors responsables de la millora de rendiment dels sistemes de computació. Tanmateix, a la primera meitat del segle XXI es va fer evident que continuar incrementant la freqüència ja no era una solució viable. El consum de potència i la densitat de potència van esdevenir massa costosos, i els dissenyadors de processadors van adoptar dissenys "multicore". Aquest nou paradigma va introduir molts reptes que no eren presents als processadors "single-threaded". Les aplicacions que s'executen a processadors multicore comparteixen diferent recursos tal i com la jerarquia de "cache" i el bus de memòria. En processadors que suporten "multi-threading" encara comparteixen més recursos: en aquest cas les aplicacions també comparteixen els recursos del "pipeline". Executar diverses aplicacions en un processador permet una millor utilització dels seus recursos, que d'altra forma podrien no tenir cap utilitat si l'aplicació en execució no els utilitzés. Compartir recursos, però, pot crear interferències entre les aplicacions executant-se al sistema. Encara que el nivell d'aquestes interferències depèn de les aplicacions que s'executen conjuntament, normalment és desitjable reduir-les per tal de millorar la eficiència. Molts dels processadors actuals exposen un conjunt sensors i actuadors que el software pot utilitzar per supervisar i controlar la compartició de recursos entre les diferents aplicacions executant-se al sistema. En general és responsabilitat dels usuaris analitzar les aplicacions del seu interès i després configurar els actuadors de forma manual. Això suposa una dificultat afegida i per aquest motiu, en molts casos els diferents mecanismes per controlar com es comparteixen els recursos senzillament no es fan servir. En aquesta tesi, presentem diferents tècniques basades en la interacció del software i el hardware per supervisar i reduir la interferència entre aplicacions, i d'aquesta forma millorar la eficiència del sistema. Primer es presenta un estudi quantitatiu que mostra els beneficis de la cooperació entre software i hardware en la eficiència del sistema. Després el focus es centra en un actuador en concret: "data prefetching". En concret, desenvolupem i avaluem diferents solucions adaptatives per millorar la eficiència de hardware data prefetching a sistemes multicore. L'impacte de les solucions presentades a aquesta tesi, però, no es limiten a aquest cas concret. Al contrari, serveixen com exemples il·lustratius per desenvolupar tècniques de cooperació software i hardware que permetin compartir els recursos en sistemes multicore de forma eficient. La compartició de recursos en un processador és un factor crucial que afecta significativament a la seva eficiència. Però, altres nivells d'un sistema de computació també comparteixen recursos. En grans instal·lacions de computació com els "datacenters", les aplicacions també poden compartir altres recursos com la xarxa o l'emmagatzemament. Com a cas d'estudi considerem el disseny d'un sistema d'un sistema de comptabilitat d'energia basat en la cooperació entre el software i el hardware per a grans instal·lacions de computació. En aquest context, explorem diverses alternatives per als sensors i actuadors que es requereixen, així com també analitzem els diferents aspectes claus en el disseny d'un sistema d'aquestes característiques

    Per-task Energy Accounting in Computing Systems

    No full text
    © 2002-2011 IEEE. We present for the first time the concept of per-task energy accounting (PTEA) and relate it to per-task energy metering (PTEM). We show the benefits of supporting both in future computing systems. Using the shared last-level cache (LLC) as an example: (1) We illustrate the complexities in providing PTEM and PTEA; (2) we present an idealized PTEM model and an accurate and low-cost implementation of it; and (3) we introduce a hardware mechanism to provide accurate PTEA in the cache.Peer Reviewe

    Per-task Energy Accounting in Computing Systems

    No full text
    corecore