19 research outputs found

    Novel DVFS Methodologies For Power-Efficient Mobile MPSoC

    Get PDF
    Low power mobile computing systems such as smartphones and wearables have become an integral part of our daily lives and are used in various ways to enhance our daily lives. Majority of modern mobile computing systems are powered by multi-processor System-on-a-Chip (MPSoC), where multiple processing elements are utilized on a single chip. Given the fact that these devices are battery operated most of the times, thus, have limited power supply and the key challenges include catering for performance while reducing the power consumption. Moreover, the reliability in terms of lifespan of these devices are also affected by the peak thermal behaviour on the device, which retrospectively also make such devices vulnerable to temperature side-channel attack. This thesis is concerned with performing Dynamic Voltage and Frequency Scaling (DVFS) on different processing elements such as CPU & GPU, and memory unit such as RAM to address the aforementioned challenges. Firstly, we design a Computer Vision based machine learning technique to classify applications automatically into different categories of workload such that DVFS could be performed on the CPU to reduce the power consumption of the device while executing the application. Secondly, we develop a reinforcement learning based agent to perform DVFS on CPU and GPU while considering the user's interaction with such devices to optimize power consumption and thermal behaviour. Next, we develop a heuristic based automated agent to perform DVFS on CPU, GPU and RAM to optimize the same while executing an application. Finally, we explored the affect of DVFS on CPUs leading to vulnerabilities against temperature side-channel attack and hence, we also designed a methodology to secure against such attack while improving the reliability in terms of lifespan of such devices

    Adaptive Knobs for Resource Efficient Computing

    Get PDF
    Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

    Exploiting Heterogeneous Multicore Processors Through Fine-Grained Scheduling and Low-Overhead Thread Migration

    Get PDF
    Heterogeneous (also known as asymmetric) multicore processors (HMPs) offer significant advantages over homogeneous multicores in terms of both power and performance. Power-efficient cores can be paired with higher-performance cores to achieve advantageous power/performance tradeoffs. Particular cores could also be tailored to efficiently meet the demands of particular application domains. Unfortunately, HMPs also create unique challenges in effective mapping of running processes to cores. The greater the diversity of cores, the more complex this problem becomes.Existing dynamic scheduling approaches for HMPs fall into two categories: sampling and prediction. Sampling approaches permute running applications to across core types to find the best-performing assignment. This sampling step hurts performance and power due to time spent migrating threads through non-optimal assignments. Alternatively, prediction-based approaches estimate the performance for each application on different types of cores and choose the schedule with the best estimated performance. Prediction eliminates the cost of sampling but may result in sub-optimal scheduling decisions.This dissertation introduces new and novel phase-identification-based online schedulers for HMPs that combine aspects of both sampling and prediction approaches by identifying phases of execution (instruction sequences with similar behavior), sampling new phases, recognizing repeating phases and reusing recorded phase information to predict the best performing schedule and optimize the schedule for either performance or power consumption. While previous approaches utilized only phase-change detection to begin evaluating new schedules, the proposed approaches recognize the current phase of each executing thread and reuse phase information recorded in a Signature History Table when the same or similar behavior of programs reoccurs. This dissertation further proposes machine-learning based schedulers that learns effective scheduling policies using the same characteristics of these program phases.Exploiting differences between relatively short duration phases using the presented scheduling techniques results in frequent thread migrations that can harm performance. Operating system (OS) context switching can be time consuming. To reduce this context switching overhead, a context switching circuit that both accelerates thread switches among cores in HMPs and reduces switching cost within each core (multitasking) is further introduced in this work. This novel context switch circuit enables low-overhead hardware-level thread migration between cores on a chip and results in up to1380X speedup as compared to an OS context switch. Together with the presented scheduling approaches, this mechanism enables efficient and fine-grained scheduling for HMPs

    GPU-based Architecture Modeling and Instruction Set Extension for Signal Processing Applications

    Get PDF
    The modeling of embedded systems attempts to estimate the performance and costs prior to the implementation. The early stage predictions for performance and power dissipation reduces the more costly late stage design modifications. Workload modeling is an approach where an abstract application is evaluated against an abstract architecture. The challenge in modeling is the balance between fidelity and simplicity, where fidelity refers to the correctness of the predictions and the simplicity relates to the simulation time of the model and its ease of comprehension for the developer. A model named GSLA for performance and power modeling is presented, which extends existing architecture modeling by including GPUs as parallel processing elements. The performance model showed an average fidelity of 93% and the power model demonstrated an average fidelity of 84% between the models and several application measurements. The GSLA model is very simple: only 2 parameters that can be obtained by automated scripts. Besides the modeling, this thesis addresses lower level signal processing system improvements by proposing Instruction Set Architecture (ISA) extensions for RISC-V processors. A vehicle classifier neural network model was used as a case study, in which the benefit of Bit Manipulation Instructions (BMI) is shown. The result is a new PopCount instruction extension that is verified in ETISS simulator. The PopCount extension of RISC-V ISA showed a performance improvement of more than double for the vehicle classifier application. In addition, the design flow for adding a new instruction extension for a re-configurable platform is presented. The GPU modeling and the RISC-V ISA extension added new features to the state of the art. They improve the modeling features as well as reduce the execution costs in signal processing platforms

    Temperature Variation Aware Energy Optimization in Heterogeneous MPSoCs

    Get PDF
    Thermal effects are rapidly gaining importance in nanometer heterogeneous integrated systems. Increased power density, coupled with spatio-temporal variability of chip workload, cause lateral and vertical temperature non-uniformities (variations) in the chip structure. The assumption of an uniform temperature for a large circuit leads to inaccurate determination of key design parameters. To improve design quality, we need precise estimation of temperature at detailed spatial resolution which is very computationally intensive. Consequently, thermal analysis of the designs needs to be done at multiple levels of granularity. To further investigate the flow of chip/package thermal analysis we exploit the Intel Single Chip Cloud Computer (SCC) and propose a methodology for calibration of SCC on-die temperature sensors. We also develop an infrastructure for online monitoring of SCC temperature sensor readings and SCC power consumption. Having the thermal simulation tool in hand, we propose MiMAPT, an approach for analyzing delay, power and temperature in digital integrated circuits. MiMAPT integrates seamlessly into industrial Front-end and Back-end chip design flows. It accounts for temperature non-uniformities and self-heating while performing analysis. Furthermore, we extend the temperature variation aware analysis of designs to 3D MPSoCs with Wide-I/O DRAM. We improve the DRAM refresh power by considering the lateral and vertical temperature variations in the 3D structure and adapting the per-DRAM-bank refresh period accordingly. We develop an advanced virtual platform which models the performance, power, and thermal behavior of a 3D-integrated MPSoC with Wide-I/O DRAMs in detail. Moving towards real-world multi-core heterogeneous SoC designs, a reconfigurable heterogeneous platform (ZYNQ) is exploited to further study the performance and energy efficiency of various CPU-accelerator data sharing methods in heterogeneous hardware architectures. A complete hardware accelerator featuring clusters of OpenRISC CPUs, with dynamic address remapping capability is built and verified on a real hardware

    Memory-aware platform description and framework for source-level embedded MPSoC software optimization

    Get PDF
    Developing optimizing source-level transformations, consists of numerous non-trivial subtasks. Besides identifying actual optimization goals within a particular target-platform and compiler setup, the actual implementation is a tedious, error-prone and often recurring work. Providing appropriate support for this development work is a challenging task. Defining and implementing a well-suited target-platform description which can be used by a wide set of optimization techniques while being precise and easy to maintain is one dimension of this challenging task. Another dimension, which has also been tackled in this work, deals with provision of an infrastructure for optimization-step representation, interaction and data retention. Finally, an appropriate source-code representation has been integrated into this approach. These contributions are tightly related to each other, they have been bundled into the MACCv2 framework, a fullfledged optimization-technique implementation and integration approach. Together, they significantly alleviate the effort required for implementation of source-level memory-aware optimization techniques for Multi Processor Systems on a Chip (MPSoCs). The system-modeling approach presented in this dissertation has been located at the processor-memory-switch (PMS) abstraction level. It offers a novel combined structural and semantical description. It combines a locally-scoped, structural modeling approach, as preferred by system designers, and a fast, database-like interface, best suited for optimization technique developers. It supports model refinement and requires only limited effort for an initial abstract system model. The general structure consists of components and channels. Based on this structure, the system model provides mechanisms for database-like access to system-global target-platform properties, while requiring only definition of locally-scoped input data annotated to system-model items. A typical set of these properties contains energy-consumption and access-latency values. The request-based retrieval of system properties is a unique feature, which makes this approach superior to state-of-the-art table-lookup-based or full-system-simulation-based approaches. Combining such component-local properties to system-global target-platform data is performed via aspect handlers. These handlers define computational rules which are applied to correlated locally-scoped data along access paths in the memory-subsystem hierarchy. This approach is capable of calculating these system-global values at a rate similar to plain table lookups, while maintaining a precision close to full-system-simulation-based estimations. This has been shown for both, energy-consumption values as well as access-latency values of the MPARM platform. The MACCv2 framework provides a set of fundamental services to the optimization technique developer. On top of these services, a system model and source-code representation are provided. Further, framework-based optimization-technique implementations are encapsulated into self-contained entities exposing well-defined interfaces. This framework has been successfully used within the European Commission funded MNEMEE project. The hierarchical processing-step representation in MACCv2 allows for encapsulation of tasks at various granularity levels. For simplified reuse in future projects, the entire toolchain as well as individual optimization techniques have been represented as processing-step entities in terms of MACCv2. A common notion of target-platform structure and properties as well as inter-processing-step communication, is achieved via framework-provided services. The system-modeling approach and the framework show the right set of properties needed to support development of memory-aware optimization techniques. The MNEMEE project, continued research work, teaching activities and PhD theses have been successfully founded on approaches and the framework proposed in this dissertation
    corecore