439 research outputs found

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    Run-time Resource Management in CMPs Handling Multiple Aging Mechanisms

    Get PDF
    Abstract—Run-time resource management is fundamental for efficient execution of workloads on Chip Multiprocessors. Application- and system-level requirements (e.g. on performance vs. power vs. lifetime reliability) are generally conflicting each other, and any decision on resource assignment, such as core allocation or frequency tuning, may positively affect some of them while penalizing some others. Resource assignment decisions can be perceived in few instants of time on performance and power consumption, but not on lifetime reliability. In fact, this latter changes very slowly based on the accumulation of effects of various decisions over a long time horizon. Moreover, aging mechanisms are various and have different causes; most of them, such as Electromigration (EM), are subject to temperature levels, while Thermal Cycling (TC) is caused mainly by temperature variations (both amplitude and frequency). Mitigating only EM may negatively affect TC and vice versa. We propose a resource orchestration strategy to balance the performance and power consumption constraints in the short-term and EM and TC aging in the long-term. Experimental results show that the proposed approach improves the average Mean Time To Failure at least by 17% and 20% w.r.t. EM and TC, respectively, while providing same performance level of the nominal counterpart and guaranteeing the power budget

    Intelligence at the Extreme Edge: A Survey on Reformable TinyML

    Full text link
    The rapid miniaturization of Machine Learning (ML) for low powered processing has opened gateways to provide cognition at the extreme edge (E.g., sensors and actuators). Dubbed Tiny Machine Learning (TinyML), this upsurging research field proposes to democratize the use of Machine Learning (ML) and Deep Learning (DL) on frugal Microcontroller Units (MCUs). MCUs are highly energy-efficient pervasive devices capable of operating with less than a few Milliwatts of power. Nevertheless, many solutions assume that TinyML can only run inference. Despite this, growing interest in TinyML has led to work that makes them reformable, i.e., work that permits TinyML to improve once deployed. In line with this, roadblocks in MCU based solutions in general, such as reduced physical access and long deployment periods of MCUs, deem reformable TinyML to play a significant part in more effective solutions. In this work, we present a survey on reformable TinyML solutions with the proposal of a novel taxonomy for ease of separation. Here, we also discuss the suitability of each hierarchical layer in the taxonomy for allowing reformability. In addition to these, we explore the workflow of TinyML and analyze the identified deployment schemes and the scarcely available benchmarking tools. Furthermore, we discuss how reformable TinyML can impact a few selected industrial areas and discuss the challenges and future directions
    • …
    corecore