14 research outputs found

    PELSI: Power-Efficient Layer-Switched Inference

    Get PDF
    Convolutional Neural Networks (CNNs) are now quintessential kernels within embedded computer vision applications deployed in edge devices. Heterogeneous Multi-Processor System-on-Chips (HMPSoCs) with Dynamic Voltage and Frequency Scaling (DVFS) capable components (CPUs and GPUs) allow for low-latency, low-power CNN inference on resource-constrained edge devices when employed efficiently. CNNs comprise several heterogeneous layer types that execute with different degrees of power efficiency on different HMPSoC components at different frequencies. We propose the first framework, PELSI, that exploits this layer-wise power efficiency heterogeneity for power-efficient CPU-GPU layer-switched CNN interference on HMPSoCs. PELSI executes each layer of a CNN on an HMPSoC component (CPU or GPU) clocked at just the right frequency for every layer such that the CNN meets its inference latency target with minimal power consumption while still accounting for the power-performance overhead of multiple switching between CPU and GPU mid-inference. PELSI incorporates a Genetic Algorithm (GA) to identify the near-optimal CPU-GPU layer-switched CNN inference configuration from within the large exponential design space that meets the given latency requirement most power efficiently. We evaluate PELSI on Rock-Pi embedded platform. The platform contains an RK3399Pro HMPSoC with DVFS-capable CPU clusters and GPU. Empirical evaluations with five different CNNs show a 44.48% improvement in power efficiency for CNN inference under PELSI over the state-of-the-art

    Dynamic Energy and Thermal Management of Multi-Core Mobile Platforms: A Survey

    Get PDF
    Multi-core mobile platforms are on rise as they enable efficient parallel processing to meet ever-increasing performance requirements. However, since these platforms need to cater for increasingly dynamic workloads, efficient dynamic resource management is desired mainly to enhance the energy and thermal efficiency for better user experience with increased operational time and lifetime of mobile devices. This article provides a survey of dynamic energy and thermal management approaches for multi-core mobile platforms. These approaches do either proactive or reactive management. The upcoming trends and open challenges are also discussed

    Analysis of DVFS Techniques for Improving the GPU Energy Effi-ciency

    Get PDF
    Abstract Dynamic Voltage Frequency Scaling (DVFS) techniques are used to improve energy efficiency of GPUs. Literature survey and thorough analysis of various schemes on DVFS techniques during the last decade are presented in this paper. Detailed analysis of the schemes is included with respect to comparison of various DVFS techniques over the years. To endow with knowledge of various power management techniques that utilize DVFS during the last decade is the main objective of this paper. During the study, we find that DVFS not only work solely but also in coordination with other power optimization techniques like load balancing and task mapping where performance and energy efficiency are affected by varying the platform and benchmark. Thorough analysis of various schemes on DVFS techniques is presented in this paper such that further research in the field of DVFS can be enhanced

    CPU-GPU-Memory DVFS for Power-Efficient MPSoC in Mobile Cyber Physical Systems

    Get PDF
    Most modern mobile cyber-physical systems such as smartphones come equipped with multi-processor systems-on-chip (MPSoCs) with variant computing capacity both to cater to performance requirements and reduce power consumption when executing an application. In this paper, we propose a novel approach to dynamic voltage and frequency scaling (DVFS) on CPU, GPU and RAM in a mobile MPSoC, which caters to the performance requirements of the executing application while consuming low power. We evaluate our methodology on a real hardware platform, Odroid XU4, and the experimental results prove the approach to be 26% more power-efficient and 21% more thermal-efficient compared to the state-of-the-art system

    Predictive Thermal Management for Energy-Efficient Execution of Concurrent Applications on Heterogeneous Multicores

    Get PDF
    Current multicore platforms contain different types of cores, organized in clusters (e.g., ARM's big.LITTLE). These platforms deal with concurrently executing applications, having varying workload profiles and performance requirements. Runtime management is imperative for adapting to such performance requirements and workload variabilities and to increase energy and temperature efficiency. Temperature has also become a critical parameter since it affects reliability, power consumption, and performance and, hence, must be managed. This paper proposes an accurate temperature prediction scheme coupled with a runtime energy management approach to proactively avoid exceeding temperature thresholds while maintaining performance targets. Experiments show up to 20% energy savings while maintaining high-temperature averages and peaks below the threshold. Compared with state-of-the-art temperature predictors, this paper predicts 35% faster and reduces the mean absolute error from 3.25 to 1.15 °C for the evaluated applications' scenarios

    Adaptive Knobs for Resource Efficient Computing

    Get PDF
    Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

    Aggressive undervolting of FPGAs : power & reliability trade-offs

    Get PDF
    In this work, we evaluate aggressive undervolting, i.e., voltage underscaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we con¿rm a large voltage guardband for several FPGA components, which in turn, delivers signi¿cant power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, and faults might start to appear. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, primarily focusing on FPGA on-chip memories, or Block RAMs (BRAMs). Understanding this behavior can allow to deploy ef¿cient mitigation techniques, and in turn, FPGA-based designs can be improved for better energy, reliability, and performance trade-offs. Finally, as a case study, we evaluate a typical FPGA-based Neural Network (NN) accelerator when the FPGA voltage is underscaled. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss below the voltage guardband gap, we proposed an application-aware technique and we also, evaluated the built-in Error-Correcting Code (ECC) mechanism. Hence, First, we developed an application-dependent BRAMs placement technique that relies on the deterministic behavior of undervolting faults, and mitigates these faults by mapping the most reliability sensitive NN parameters to BRAM blocks that are relatively more resistant to undervolting faults. Second, as a more general technique, we applied the built-in ECC of BRAMs and observed a signi¿cant fault coverage capability thanks to the behavior of undervolting faults, with a negligible power consumption overhead.En este trabajo, evaluamos el reducir el voltaje en forma agresiva, es decir, bajar la tensión por debajo del nivel nominal para reducir el consumo de energía en Field Programmable Gate Arrays (FPGA). Por lo general, los vendedores de chips establecen margen de seguridad al voltaje para garantizar el funcionamiento de los mismos en el peor de los casos y en los peores escenarios ambientales. Mediante la experimentación en varias arquitecturas FPGA, confirmamos que hay un margen de seguridad de voltaje grande en varios de los componentes de la FPGA, que a su vez, nos ofrece ahorros de energía significativos. Sin embargo, un trabajar a un voltaje por debajo del margen de seguridad del voltaje puede causar problemas de confiabilidad a medida ya que aumenta el retardo del circuito y pueden comenzar a aparecer fallos. Caracterizamos ampliamente el comportamiento de estos fallos en términos de velocidad, ubicación, tipo, así como la sensibilidad a la temperatura ambiental, centrándonos principalmente en memorias internas de la FPGA, o Block RAM (BRAM). Comprender este comportamiento puede permitir el desarrollo de técnicas eficientes de mitigación y, a su vez, mejorar los diseños basados en FPGA para obtener ahorros en energía, una mayor confiabilidad y un mayor rendimiento. Finalmente, como caso de estudio, evaluamos un acelerador típico de Redes Neuronales basado en FPGA cuando el voltaje de la FPGA esta por debajo del nivel mínimo de seguridad. En consecuencia, los considerables ahorros de energía de la red neuronal vienen asociados con la pérdida de precisión de la red neuronal. Para obtener ahorros de energía sin una pérdida de precisión en la red neuronal por debajo del margen de seguridad del voltaje, proponemos una técnica que tiene en cuenta la aplicación, asi mismo, evaluamos el mecanismo integrado en las BRAMs de Error Correction Code (ECC). Por lo tanto, en primer lugar, desarrollamos una técnica de colocación de BRAM dependiente de la aplicación que se basa en el comportamiento determinista de las fallos cuando la FPGA funciona por debajo del margen de seguridad, y se mitigan estos fallos asignando los parámetros de la red neuronal más sensibles a producir fallos a los bloques BRAM que son relativamente más resistentes a los fallos. En segundo lugar, como técnica más general, aplicamos el ECC incorporado de los BRAM y observamos una capacidad de cobertura de fallos significativo gracias a las características de comportamiento de fallos, con una sobrecoste de consumo de energía insignificantePostprint (published version

    Integrated CPU-GPU Power Management for 3D Mobile Games

    No full text
    corecore