Search CORE

12 research outputs found

Statistical Learning in Chip (SLIC) (Invited Paper)

Author: Diana Marculescu
Donald E Thomas
Jeff Schneider
Jeyanandh Paramesh
Ken Mai
Radu Marculescu
Ronald D Blanton
Xin Li
Publication venue
Publication date: 11/04/2020
Field of study

Abstract-Despite best efforts, integrated systems are "born" (manufactured) with a unique 'personality' that stems from our inability to precisely fabricate their underlying circuits, and create software a priori for controlling the resulting uncertainty. It is possible to use sophisticated test methods to identify the bestperforming systems but this would result in unacceptable yields and correspondingly high costs. The system personality is further shaped by its environment (e.g., temperature, noise and supply voltage) and usage (i.e., the frequency and type of applications executed), and since both can fluctuate over time, so can the system's personality. Systems also "grow old" and degrade due to various wear-out mechanisms (e.g., negative-bias temperature instability), and unexpectedly due to various early-life failure sources. These "nature and nurture" influences make it extremely difficult to design a system that will operate optimally for all possible personalities. To address this challenge, we propose to develop statistical learning in-chip (SLIC). SLIC is a holistic approach to integrated system design based on continuously learning key personality traits on-line, for selfevolving a system to a state that optimizes performance hierarchically across the circuit, platform, and application levels. SLIC will not only optimize integrated-system performance but also reduce costs through yield enhancement since systems that would have before been deemed to have weak personalities (unreliable, faulty, etc.) can now be recovered through the use of SLIC

CiteSeerX

Intelligent Management of Mobile Systems through Computational Self-Awareness

Author: Donyanavard Bryan
Dutt Nikil
Jantsch Axel
Mutlu Onur
Rahmani Amir M.
Publication venue
Publication date: 31/07/2020
Field of study

Runtime resource management for many-core systems is increasingly complex. The complexity can be due to diverse workload characteristics with conflicting demands, or limited shared resources such as memory bandwidth and power. Resource management strategies for many-core systems must distribute shared resource(s) appropriately across workloads, while coordinating the high-level system goals at runtime in a scalable and robust manner. To address the complexity of dynamic resource management in many-core systems, state-of-the-art techniques that use heuristics have been proposed. These methods lack the formalism in providing robustness against unexpected runtime behavior. One of the common solutions for this problem is to deploy classical control approaches with bounds and formal guarantees. Traditional control theoretic methods lack the ability to adapt to (1) changing goals at runtime (i.e., self-adaptivity), and (2) changing dynamics of the modeled system (i.e., self-optimization). In this chapter, we explore adaptive resource management techniques that provide self-optimization and self-adaptivity by employing principles of computational self-awareness, specifically reflection. By supporting these self-awareness properties, the system can reason about the actions it takes by considering the significance of competing objectives, user requirements, and operating conditions while executing unpredictable workloads

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Adaptive Resource Management for Mobile Multiprocessors through Computational Self-Awareness

Author: Donyanavard Bryan
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Runtime resource management for many-core systems is increasingly complex.The complexity can be due to diverse workload characteristics with conflicting demands or limited shared resources such as memory bandwidth and power. Resource management strategies for many-core systems must distribute shared resource(s) appropriately across workloads, while coordinating the high-level system goals at runtime in a scalable and robust manner.To address the complexity of dynamic resource management in many-core systems, state-of-the-art techniques that use heuristics have been proposed. These methods lack the formalism in providing robustness against unexpected runtime behavior. One of the common solutions for this problem is to deploy classical control approaches with bounds and formal guarantees. Traditional control theoretic methods lack the ability to adapt to (1) changing goals at runtime (i.e., self-adaptivity), and (2) changing dynamics of the modeled system (i.e., self-optimization).In this thesis, I explore adaptive resource management techniques that provide self-optimization and self-adaptivity by employing principles of computational self-awareness, specifically reflection. By supporting these self-awareness properties, the system will reason about the actions it takes by considering the significance of competing objectives, user requirements, and operating conditions while executing unpredictable workloads

eScholarship - University of California

Recommended from our members

Achieving Accurate Predictions of Future Events Under Hardware Heterogeneity

Author: Prodromou Andreas
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Heterogeneous hardware is becoming increasingly available in modern hardware, while research breakthroughs enforce the expectation that heterogeneity will keep increasing in the future. Significant gains can be achieved via appropriate utilization of heterogeneity, in terms of performance and power consumption, however, poor utilization can have a detrimental effect. Intelligent scheduling and resource management is a crucial challenge we need to overcome in order to harvest the full potential of heterogeneous hardware. As systems become larger and include greater levels of hardware diversity, the importance of intelligent scheduling and resource management is further accentuated.This dissertation presents techniques that aid the process of scheduling and resource management in the presence of heterogeneous hardware, via accurately predicting upcoming runtime events. With a proactive and accurate view of the near future, schedulers can utilize the underlying hardware more efficiently, and fully take advantage of the available benefits.By adapting a majority element heuristic, this dissertation significantly improves the accuracy of predicting memory addresses about to be accessed, while reducing prediction-related costs by a factor of ten thousand compared to previously proposed predictive approaches. Coupled with novel microarchitectural modifications, accurate address predictions are shown to improve the performance of heterogeneous memory architectures.Machine learning-based performance predictors are further presented, capable of predicting a program's performance when executed on a given general-purpose core. Trained to model the subtleties of the interaction between hardware and software, these predictors are capable of generating highly accurate predictions even for cores with varied Instruction Set Architectures. Utilizing these performance predictions for job scheduling, is shown to improve overall system performance. The trained predictors are further examined and interpreted in order to visualize the correlations between features picked up and amplified during training.Finally, this dissertation demonstrates that scheduling algorithms cannot guarantee deriving an optimal schedule during realistic execution scenarios due to the underlying hardware heterogeneity, the wide range of runtime requirements of software, as well as prediction error from performance predictors. In response, deep neural networks are trained to select one scheduling approach from a list of options with varied overheads and correctness guarantees. The scheduling approach chosen, is the one which will most likely return the highest-performance schedule with the lowest overhead, given a particular instance of the job-to-core assignment problem

eScholarship - University of California

TOWARDS PROCESS CONTEXT DRIVEN AND PMU UPDATED PREEMPTIVE SCHEDULING FOR SINGLE-ISA HETEROGENEOUS SYSTEMS

Author: Alifieraki Ioanna - Maria
Publication venue
Publication date: 01/08/2022
Field of study

The University of Manchester - Institutional Repository

Runtime Management of Multiprocessor Systems for Fault Tolerance, Energy Efficiency and Load Balancing

Author: Tzilis Stavros
Publication venue
Publication date: 01/01/2019
Field of study

Efficiency of modern multiprocessor systems is hurt by unpredictable events: aging causes permanent faults that disable components; application spawnings and terminations taking place at arbitrary times, affect energy proportionality, causing energy waste; load imbalances reduce resource utilization, penalizing performance. This thesis demonstrates how runtime management can mitigate the negative effects of unpredictable events, making decisions guided by a combination of static information known in advance and parameters that only become known at runtime. We propose techniques for three different objectives: graceful degradation of aging-prone systems; energy efficiency of heterogeneous adaptive systems; and load balancing by means of work stealing. Managing aging-prone systems for graceful efficiency degradation, is based on a high-level system description that encapsulates hardware reconfigurability and workload flexibility and allows to quantify system efficiency and use it as an objective function. Different custom heuristics, as well as simulated annealing and a genetic algorithm are proposed to optimize this objective function as a response to component failures. Custom heuristics are one to two orders of magnitude faster, provide better efficiency for the first 20% of system lifetime and are less than 13% worse than a genetic algorithm at the end of this lifetime. Custom heuristics occasionally fail to satisfy reconfiguration cost constraints. As all algorithms\u27 execution time scales well with respect to system size, a genetic algorithm can be used as backup in these cases. Managing heterogeneous multiprocessors capable of Dynamic Voltage and Frequency Scaling is based on a model that accurately predicts performance and power: performance is predicted by combining static, application-specific profiling information and dynamic, runtime performance monitoring data; power is predicted using the aforementioned performance estimations and a set of platform-specific, static parameters, determined only once and used for every application mix. Three runtime heuristics are proposed, that make use of this model to perform partial search of the configuration space, evaluating a small set of configurations and selecting the best one. When best-effort performance is adequate, the proposed approach achieves 3% higher energy efficiency compared to the powersave governor and 2x better compared to the interactive and ondemand governors. When individual applications\u27 performance requirements are considered, the proposed approach is able to satisfy them, giving away 18% of system\u27s energy efficiency compared to the powersave, which however misses the performance targets by 23%; at the same time, the proposed approach maintains an efficiency advantage of about 55% compared to the other governors, which also satisfy the requirements. Lastly, to improve load balancing of multiprocessors, a partial and approximate view of the current load distribution among system cores is proposed, which consists of lightweight data structures and is maintained by each core through cheap operations. A runtime algorithm is developed, using this view whenever a core becomes idle, to perform victim core selection for work stealing, also considering system topology and memory hierarchy. Among 12 diverse imbalanced workloads, the proposed approach achieves better performance than random, hierarchical and local stealing for six workloads. Furthermore, it is at most 8% slower among the other six workloads, while competing strategies incur a penalty of at least 89% on some workload

Chalmers Research