94 research outputs found

    Learning-based runtime management of energy-efficient and reliable many-core systems

    No full text
    This paper highlights and demonstrates our research works to date addressing the energy-efficiency and reliability challenges of many-core systems through intelligent runtime management algorithms. The algorithms are implemented through cross-layer interactions between the three layers: application, runtime and hardware, forming our core theme of working together. The annotated application tasks communicate the performance, energy or reliability requirements to the runtime. With such requirements, the runtime exercises the hardware through various control knobs and gets the feedback of these controls through the performance monitors. The aim is to learn the best possible hardware controls during runtime to achieve energy-efficiency and improved reliability, while meeting the specified application requirements

    Run-time Resource Management in CMPs Handling Multiple Aging Mechanisms

    Get PDF
    Abstract—Run-time resource management is fundamental for efficient execution of workloads on Chip Multiprocessors. Application- and system-level requirements (e.g. on performance vs. power vs. lifetime reliability) are generally conflicting each other, and any decision on resource assignment, such as core allocation or frequency tuning, may positively affect some of them while penalizing some others. Resource assignment decisions can be perceived in few instants of time on performance and power consumption, but not on lifetime reliability. In fact, this latter changes very slowly based on the accumulation of effects of various decisions over a long time horizon. Moreover, aging mechanisms are various and have different causes; most of them, such as Electromigration (EM), are subject to temperature levels, while Thermal Cycling (TC) is caused mainly by temperature variations (both amplitude and frequency). Mitigating only EM may negatively affect TC and vice versa. We propose a resource orchestration strategy to balance the performance and power consumption constraints in the short-term and EM and TC aging in the long-term. Experimental results show that the proposed approach improves the average Mean Time To Failure at least by 17% and 20% w.r.t. EM and TC, respectively, while providing same performance level of the nominal counterpart and guaranteeing the power budget

    TheSPoT: Thermal Stress-Aware Power and Temperature Management for Multiprocessor Systems-on-Chip

    Get PDF
    • …
    corecore